rmbg/README.md

# BiRefNet Background Removal Service

GPU-accelerated background removal as an HTTP API. Two pipelines:

- **Auto** — [BiRefNet](https://huggingface.co/ZhengPeng7/BiRefNet) /
  [RMBG-2.0](https://huggingface.co/briaai/RMBG-2.0) salient-object matting.
- **Prompt** — [GroundingDINO](https://huggingface.co/IDEA-Research/grounding-dino-tiny)
  + [SAM](https://huggingface.co/facebook/sam-vit-base): segment whatever a text
  prompt describes.

Served with [LitServe](https://github.com/Lightning-AI/LitServe), packaged for
the NVIDIA container runtime.

## Requirements

- NVIDIA GPU + driver, Docker, and the `nvidia` container runtime
- ~5 GB free disk for model weights (downloaded on first use, cached in a volume)

## Quick start

```bash
make build      # build the Docker image
make run        # start the service on :8000 (GPU)
make logs       # watch startup — first run downloads model weights
make test       # send test.jpg, save output.png
```

`make test` waits for `/health` before sending, so the first call may block
while a model downloads and loads.

### Web UI

Open **http://localhost:8000/** — a single-page test app (handy over SSH):

- **Auto remove** — pick a model variant + resolution.
- **Prompt segment** — type what to keep (e.g. `the dog`), tune the
  GroundingDINO box / text thresholds.

Both modes support a transparency checkerboard preview, click-to-zoom lightbox,
optional crop-to-subject, and download.

#### Keyboard shortcuts

The UI is fully keyboard-drivable. Shortcuts are ignored while typing in a
field and while Ctrl/Cmd/Alt is held.

| Key                 | Action                                        |
|---------------------|-----------------------------------------------|
| `B`                 | Toggle the controls sidebar                   |
| `U`                 | Open the file picker to upload an image       |
| `I` / `O`           | Show the input / output image                 |
| `F` / `Z`           | Open the zoom view for the visible image      |
| `S`                 | Save (download PNG), once a result exists     |

In the zoom view:

| Key                       | Action                                  |
|---------------------------|-----------------------------------------|
| `F` / `Z` / `Esc`         | Close the zoom view                     |
| `+` / `-`                 | Zoom in / out (1×–8×)                   |
| `0`                       | Reset zoom & pan                        |
| Arrows or `H` `J` `K` `L` | Pan (while zoomed past 1×)              |

## API

### `POST /predict` — automatic background removal

```jsonc
{
  "image": "<base64 image bytes>",   // required
  "model": "HR",                     // general|HR|portrait|matting|lite|rmbg2
  "resolution": 2048,                // inference resolution (×32)
  "background": "alpha",             // alpha|white|black|gray|green|blue|red
  "mask_blur": 0,                    // Gaussian blur radius on mask edges
  "crop": false,                     // crop to the foreground bounding box
  "crop_margin": 0.0,                // crop margin in inches (uses image DPI)
  "return_mask": false               // include the raw mask in the response
}
```

### `POST /segment` — prompt-conditioned segmentation

```jsonc
{
  "image": "<base64 image bytes>",   // required
  "prompt": "the dog",               // required — object(s) to keep
  "box_threshold": 0.3,              // GroundingDINO detection threshold
  "text_threshold": 0.25,
  "background": "alpha",
  "mask_blur": 0,
  "crop": false,
  "crop_margin": 0.0
}
```

Response (both): `image` (base64 PNG), `format`, `width`, `height`, plus
`model`/`resolution` (`/predict`) or `detections`/`prompt` (`/segment`).

`GET /health` returns 200 when the service is ready.

## CLI

```bash
python3 scripts/client.py --input photo.jpg --output cut.png --model HR --resolution 2048 --crop
python3 scripts/client.py --input photo.jpg --output dog.png --prompt "the dog" --crop
```

## Configuration (environment variables)

| Variable             | Default                        | Purpose                       |
|----------------------|--------------------------------|-------------------------------|
| `PORT`               | `8000`                         | HTTP port                     |
| `BIREFNET_MODEL`     | `general`                      | Default Auto variant          |
| `BIREFNET_RESOLUTION`| `1024`                         | Default Auto resolution       |
| `DINO_MODEL`         | `IDEA-Research/grounding-dino-tiny` | GroundingDINO checkpoint |
| `SAM_MODEL`          | `facebook/sam-vit-large`       | SAM checkpoint                |
| `REQUEST_TIMEOUT`    | `120`                          | Per-request timeout (seconds) |

## Local development (no Docker)

Requires a local CUDA-capable PyTorch environment.

```bash
make dev        # uv sync + run the server locally
```

## Layout

```
src/rmbg_as_a_service/model.py           BiRefNet / RMBG-2.0 wrapper + compositing
src/rmbg_as_a_service/prompt_segment.py  GroundingDINO + SAM pipeline
src/rmbg_as_a_service/server.py          LitServe /predict + /segment + web UI
src/rmbg_as_a_service/static/            web UI (index.html + styles.css)
scripts/client.py                       stdlib-only test client
Dockerfile / compose.yml                CUDA image + nvidia runtime
Makefile                                build / run / test shortcuts
```