rmbg/README.md
2026-05-16 22:23:04 -06:00

138 lines
5.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BiRefNet Background Removal Service
GPU-accelerated background removal as an HTTP API. Two pipelines:
- **Auto** — [BiRefNet](https://huggingface.co/ZhengPeng7/BiRefNet) /
[RMBG-2.0](https://huggingface.co/briaai/RMBG-2.0) salient-object matting.
- **Prompt** — [GroundingDINO](https://huggingface.co/IDEA-Research/grounding-dino-tiny)
+ [SAM](https://huggingface.co/facebook/sam-vit-base): segment whatever a text
prompt describes.
Served with [LitServe](https://github.com/Lightning-AI/LitServe), packaged for
the NVIDIA container runtime.
## Requirements
- NVIDIA GPU + driver, Docker, and the `nvidia` container runtime
- ~5 GB free disk for model weights (downloaded on first use, cached in a volume)
## Quick start
```bash
make build # build the Docker image
make run # start the service on :8000 (GPU)
make logs # watch startup — first run downloads model weights
make test # send test.jpg, save output.png
```
`make test` waits for `/health` before sending, so the first call may block
while a model downloads and loads.
### Web UI
Open **http://localhost:8000/** — a single-page test app (handy over SSH):
- **Auto remove** — pick a model variant + resolution.
- **Prompt segment** — type what to keep (e.g. `the dog`), tune the
GroundingDINO box / text thresholds.
Both modes support a transparency checkerboard preview, click-to-zoom lightbox,
optional crop-to-subject, and download.
#### Keyboard shortcuts
The UI is fully keyboard-drivable. Shortcuts are ignored while typing in a
field and while Ctrl/Cmd/Alt is held.
| Key | Action |
|---------------------|-----------------------------------------------|
| `B` | Toggle the controls sidebar |
| `U` | Open the file picker to upload an image |
| `I` / `O` | Show the input / output image |
| `F` / `Z` | Open the zoom view for the visible image |
| `S` | Save (download PNG), once a result exists |
In the zoom view:
| Key | Action |
|---------------------------|-----------------------------------------|
| `F` / `Z` / `Esc` | Close the zoom view |
| `+` / `-` | Zoom in / out (1×8×) |
| `0` | Reset zoom & pan |
| Arrows or `H` `J` `K` `L` | Pan (while zoomed past 1×) |
## API
### `POST /predict` — automatic background removal
```jsonc
{
"image": "<base64 image bytes>", // required
"model": "HR", // general|HR|portrait|matting|lite|rmbg2
"resolution": 2048, // inference resolution (×32)
"background": "alpha", // alpha|white|black|gray|green|blue|red
"mask_blur": 0, // Gaussian blur radius on mask edges
"crop": false, // crop to the foreground bounding box
"crop_margin": 0.0, // crop margin in inches (uses image DPI)
"return_mask": false // include the raw mask in the response
}
```
### `POST /segment` — prompt-conditioned segmentation
```jsonc
{
"image": "<base64 image bytes>", // required
"prompt": "the dog", // required — object(s) to keep
"box_threshold": 0.3, // GroundingDINO detection threshold
"text_threshold": 0.25,
"background": "alpha",
"mask_blur": 0,
"crop": false,
"crop_margin": 0.0
}
```
Response (both): `image` (base64 PNG), `format`, `width`, `height`, plus
`model`/`resolution` (`/predict`) or `detections`/`prompt` (`/segment`).
`GET /health` returns 200 when the service is ready.
## CLI
```bash
python3 scripts/client.py --input photo.jpg --output cut.png --model HR --resolution 2048 --crop
python3 scripts/client.py --input photo.jpg --output dog.png --prompt "the dog" --crop
```
## Configuration (environment variables)
| Variable | Default | Purpose |
|----------------------|--------------------------------|-------------------------------|
| `PORT` | `8000` | HTTP port |
| `BIREFNET_MODEL` | `general` | Default Auto variant |
| `BIREFNET_RESOLUTION`| `1024` | Default Auto resolution |
| `DINO_MODEL` | `IDEA-Research/grounding-dino-tiny` | GroundingDINO checkpoint |
| `SAM_MODEL` | `facebook/sam-vit-large` | SAM checkpoint |
| `REQUEST_TIMEOUT` | `120` | Per-request timeout (seconds) |
## Local development (no Docker)
Requires a local CUDA-capable PyTorch environment.
```bash
make dev # uv sync + run the server locally
```
## Layout
```
src/rmbg_as_a_service/model.py BiRefNet / RMBG-2.0 wrapper + compositing
src/rmbg_as_a_service/prompt_segment.py GroundingDINO + SAM pipeline
src/rmbg_as_a_service/server.py LitServe /predict + /segment + web UI
src/rmbg_as_a_service/static/ web UI (index.html + styles.css)
scripts/client.py stdlib-only test client
Dockerfile / compose.yml CUDA image + nvidia runtime
Makefile build / run / test shortcuts
```