BiRefNet Background Removal Service

GPU-accelerated background removal exposed as an HTTP API. Uses BiRefNet for matting, served with LitServe, packaged for the NVIDIA container runtime.

Requirements

NVIDIA GPU + driver, Docker, and the nvidia container runtime
~2 GB free disk for the model weights (downloaded on first run)

Quick start

make build      # build the Docker image
make run        # start the service on :8000 (GPU)
make logs       # watch startup — first run downloads BiRefNet weights
make test       # send test.jpg, save output.png

make test waits for the service /health endpoint before sending the request, so the first call may block while the model downloads and loads.

Web UI

A minimal test page is served at the service root — open http://localhost:8000/ in a browser, drop in an image, and preview the transparent-background result (handy when working over SSH). It calls the same /predict endpoint.

Useful variations

make test BG=white                  # composite onto a white background
make test INPUT=photo.jpg OUTPUT=cut.png
make test-mask                      # also save the raw alpha mask (mask.png)
make help                           # list all targets

API

POST /predict

{
  "image": "<base64 image bytes>",   // required
  "background": "alpha",             // alpha|white|black|gray|green|blue|red
  "mask_blur": 0,                    // Gaussian blur radius on mask edges
  "return_mask": false               // include the raw mask in the response
}

Response:

{
  "image": "<base64 PNG>",
  "format": "png",
  "width": 3637,
  "height": 3637,
  "mask": "<base64 PNG>"             // only when return_mask=true
}

GET /health returns 200 when the service is ready.

Configuration (environment variables)

Variable	Default	Purpose
`PORT`	`8000`	HTTP port
`BIREFNET_MODEL`	`ZhengPeng7/BiRefNet`	HuggingFace repo for the weights
`BIREFNET_RESOLUTION`	`1024`	Inference resolution
`REQUEST_TIMEOUT`	`120`	Per-request timeout (seconds)

Local development (no Docker)

Requires a local CUDA-capable PyTorch environment.

make dev        # uv sync + run the server locally

Layout

src/birefnet_service/model.py    BiRefNet wrapper (load + inference)
src/birefnet_service/server.py   LitServe API + web UI route
src/birefnet_service/static/     web UI (index.html)
scripts/client.py                stdlib-only test client
Dockerfile / docker-compose.yml  CUDA image + nvidia runtime
Makefile                         build / run / test shortcuts

2.9 KiB Raw Blame History