rmbg/README.md

# BiRefNet Background Removal Service

GPU-accelerated background removal exposed as an HTTP API. Uses
[BiRefNet](https://huggingface.co/ZhengPeng7/BiRefNet) for matting, served with
[LitServe](https://github.com/Lightning-AI/LitServe), packaged for the
NVIDIA container runtime.

## Requirements

- NVIDIA GPU + driver, Docker, and the `nvidia` container runtime
- ~2 GB free disk for the model weights (downloaded on first run)

## Quick start

```bash
make build      # build the Docker image
make run        # start the service on :8000 (GPU)
make logs       # watch startup — first run downloads BiRefNet weights
make test       # send test.jpg, save output.png
```

`make test` waits for the service `/health` endpoint before sending the
request, so the first call may block while the model downloads and loads.

### Web UI

A minimal test page is served at the service root — open
**http://localhost:8000/** in a browser, drop in an image, and preview the
transparent-background result (handy when working over SSH). It calls the
same `/predict` endpoint.

### Useful variations

```bash
make test BG=white                  # composite onto a white background
make test INPUT=photo.jpg OUTPUT=cut.png
make test-mask                      # also save the raw alpha mask (mask.png)
make help                           # list all targets
```

## API

`POST /predict`

```jsonc
{
  "image": "<base64 image bytes>",   // required
  "background": "alpha",             // alpha|white|black|gray|green|blue|red
  "mask_blur": 0,                    // Gaussian blur radius on mask edges
  "return_mask": false               // include the raw mask in the response
}
```

Response:

```jsonc
{
  "image": "<base64 PNG>",
  "format": "png",
  "width": 3637,
  "height": 3637,
  "mask": "<base64 PNG>"             // only when return_mask=true
}
```

`GET /health` returns 200 when the service is ready.

## Configuration (environment variables)

| Variable             | Default              | Purpose                          |
|----------------------|----------------------|----------------------------------|
| `PORT`               | `8000`               | HTTP port                        |
| `BIREFNET_MODEL`     | `ZhengPeng7/BiRefNet`| HuggingFace repo for the weights |
| `BIREFNET_RESOLUTION`| `1024`               | Inference resolution             |
| `REQUEST_TIMEOUT`    | `120`                | Per-request timeout (seconds)    |

## Local development (no Docker)

Requires a local CUDA-capable PyTorch environment.

```bash
make dev        # uv sync + run the server locally
```

## Layout

```
src/birefnet_service/model.py    BiRefNet wrapper (load + inference)
src/birefnet_service/server.py   LitServe API + web UI route
src/birefnet_service/static/     web UI (index.html)
scripts/client.py                stdlib-only test client
Dockerfile / docker-compose.yml  CUDA image + nvidia runtime
Makefile                         build / run / test shortcuts
```