Remove backgrounds from AI-generated images by comparing multiple renders on different solid colours.
AI image generators (Nano Banana, Midjourney, DALL·E, etc.) cannot output true transparency. Diffmat solves this: give it 2–5 renders of the same subject that differ only in background colour, and it isolates the subject pixel by pixel — producing a clean RGBA PNG.
No AI, no chroma key — just pixel math.
If you've used AI to generate images and wanted a true transparent background, you know it isn't simple. Asking the AI for transparency often yields a fake checkered pattern baked into the image — which makes removal harder, not easier. Existing AI matting tools and Photoshop techniques work, but results vary wildly. Diffmat is another strategy. I wrote it for my own use and paid for my own AI tokens; I'm happy with the results, so I'm sharing the code.
- How It Works
- Choosing a Method
- Installation
- Usage
- Input Requirements
- Filename Convention
- Default Background Colours
- Tolerance Tuning
- Antialiasing
- Performance
- Limitations
- Samples
- Contributing
- License
- Reference
Start by generating your source images. The following prompt works well with most AI image tools:
Act as a professional, expert image manipulation AI. Your task is to take the provided source image and generate five distinct, high-resolution versions.
Mandatory Constraints:
- Subject Isolation: The subject must be perfectly and completely isolated from the original background.
- Fidelity: The subject must retain 100% of its original details, poses, lighting, and appearance across all five outputs. No details should be lost or altered.
- Background Constraint: The background must be replaced entirely with a solid, opaque colour. Absolutely no transparency, gradients, or checkered patterns are allowed.
Output Requirements: Generate five separate image files, strictly adhering to the following specifications:
- Pure Black — RGB (0, 0, 0)
- Pure White — RGB (255, 255, 255)
- Pure Red — RGB (255, 0, 0)
- Pure Green — RGB (0, 255, 0)
- Pure Blue — RGB (0, 0, 255)
Once you have the renders, diffmat inspects every pixel position (x, y) across all N input images:
| Situation | Result |
|---|---|
| Every image shows its own background at (x, y) | → transparent (alpha = 0) |
| All images show a similar colour at (x, y) | → opaque subject (alpha = 255) |
| Mixed — some show background, some don't | → edge (alpha between 0–255) |
The exact way edge alpha is calculated depends on the method you choose.
Three algorithms are built in. They all produce a transparent PNG — they differ in how they decide the alpha value at edge pixels and whether they reconstruct the true subject colour.
simple |
variance |
decomposition |
|
|---|---|---|---|
| Approach | Binary classification | Statistical variance | Linear unmixing |
| Speed | Fastest (~30 s / 1 MP) | Fast (~40 s / 1 MP) | Slowest (~60 s / 1 MP) |
| Edge quality | Hard or lightly smoothed | Smooth gradient | Physically modelled |
| Re-colours subject? | No (uses reference image) | No (uses reference image) | Yes (estimates true colour) |
| Best for | Crisp icons, logos, solids | Anti-aliased edges, vector art | Semi-transparent, shadows, glass |
| Weakness | Jagged on fine detail | Can over-smooth thin features | Sensitive to JPEG noise |
Each pixel gets one of three labels: background, subject, or edge. Edge alpha is either a hard 0/255 majority vote or a smooth blend based on average distance to each assigned background colour.
Pros
- Fastest — processes every pixel once.
- Predictable and easy to reason about.
- Great for icons, logos, and objects with hard, well-defined edges.
Cons
- Binary "similar or not" test produces jagged silhouettes on anti-aliased inputs.
- Cannot distinguish why pixels differ — a shadow looks the same as a background edge.
Use when: your subject has crisp, hard edges and speed matters. Avoid when: the subject contains hair, fur, feathers, drop shadows, glass, smoke, or translucent materials.
Instead of thresholding, this method computes the variance of pixel values across all images at each position. Low variance + far from background → opaque. Low variance + near background → transparent. Medium variance → partial alpha proportional to 1 − normalised_variance.
Pros
- Smoother transitions — alpha is driven by a continuous statistic, not a threshold.
- No hard cut-off — the gradient reflects how consistently the subject appears.
- Good middle ground between speed and quality.
Cons
- Variance alone cannot distinguish edge blur from colour noise. Noisy JPEGs can produce false edges.
- Can over-smooth very fine detail (individual hairs, thin strokes).
Use when: source images have anti-aliased edges and you want smoother silhouettes than simple gives.
Avoid when: the subject has semi-transparent parts (use decomposition) or inputs are heavily compressed.
Models each observed pixel as a physical blend:
I_i = α × S + (1 − α) × B_i
where I_i is the observed colour in image i, B_i is the known background colour, S is the true subject colour, and α is the opacity. The method solves iteratively for both α and S at every pixel using least-squares.
Pros
- Best quality for semi-transparent edges, drop shadows, glass, smoke, and reflections.
- Recovers the true subject colour by subtracting each image's background contribution.
- Physically meaningful —
αdirectly represents "how much of this pixel is the subject."
Cons
- Slowest — runs 3 iterations of a solve loop at every pixel.
- Sensitive to compression artefacts: JPEG noise can look like background bleed, causing
αto be underestimated. - Reconstructs RGB values — output colours may differ slightly from input.
Use when: the subject has semi-transparent regions (hair strands, glass, smoke, drop shadows) and you want the cleanest possible edge. Avoid when: inputs are heavily compressed JPEGs or you need exact input RGB preservation.
Does your subject have semi-transparent parts? (hair, glass, shadow)
├─ Yes → decomposition
└─ No
├─ Are edges clean and hard? (icons, logos)
│ └─ Yes → simple
└─ Are edges anti-aliased / slightly soft?
└─ Yes → variance (or decomposition for best quality)
git clone https://github.com/YOUR_USERNAME/diffmat.git
cd diffmat
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtRequirements: Python 3.8+, Pillow, NumPy.
python diffmat.py path/to/folder/
# Output: path/to/folder/output.pngpython diffmat.py path/to/folder/ --method decomposition
python diffmat.py path/to/folder/ --method variance --tolerance 30python diffmat.py path/to/folder/ --no-antialiasingpython diffmat.py path/to/folder/ \
--colors 255,255,255 0,0,0 255,0,0 \
--tolerance 30python diffmat.py --images white.png black.png red.png green.png blue.png -o result.png folder Folder with 2-5 images
--images IMG [IMG ...] 2-5 image files
-o, --output PATH Output PNG (default: folder/output.png)
--colors R,G,B [R,G,B ...] Background colours per image
--tolerance FLOAT RGB distance for bg matching (default: 50)
--similarity-tolerance FLOAT Max inter-image distance for subject (default: --tolerance)
--reference INT Reference image index for RGB output (default: 0)
--no-antialiasing Hard 0/255 alpha only
--method {simple,variance,decomposition} (default: simple)
--debug Print border colours and assignment distances
| Requirement | Detail |
|---|---|
| Count | 2–5 images |
| Dimensions | All identical (same width × height) |
| Subject | Same subject, same pose — only the background changes |
| Backgrounds | Solid colours (no gradients, no textures) |
| Format | PNG recommended (lossless). JPEG works but may add noise. |
Include one of these keywords in each filename so the script can auto-match image → background:
| Background | Keywords |
|---|---|
| White | white, whit |
| Black | black, blac |
| Red | red |
| Green | green, gree |
| Blue | blue, blu |
Example: logo_white.png, logo_black.png, logo_red.png.
If filenames don't contain keywords, the script falls back to dominant border colour detection — reliable, but less predictable.
| Colour | RGB |
|---|---|
| White | 255, 255, 255 |
| Black | 0, 0, 0 |
| Red | 255, 0, 0 |
| Green | 0, 255, 0 |
| Blue | 0, 0, 255 |
| Parameter | Default | Increase when… | Decrease when… |
|---|---|---|---|
--tolerance |
50 | Input is compressed (JPEG artefacts), or backgrounds aren't pure | Backgrounds are exact solid colours and you see halos |
--similarity-tolerance |
= --tolerance |
Subject has subtle shading differences across renders | Subject is truly identical across all backgrounds |
Rule of thumb: start at 50. If you see background leftovers, raise to 70–80. If subject areas are going transparent, lower to 30–40.
Antialiasing applies only to the silhouette edge — the transition from transparent to opaque. It smooths the alpha channel, not the RGB colours.
- What it does: at boundary pixels, alpha is set to a gradient value (0–255) for a smooth transition instead of a jagged step.
- What it doesn't do: it doesn't blend or soften colours within the subject. Red next to blue stays sharp.
- Disable it: use
--no-antialiasingfor hard 0/255 alpha only.
All three methods use nested Python loops over pixels (full vectorisation is limited by the per-pixel decision logic). Rough timings on a modern laptop:
| Image size | simple |
variance |
decomposition |
|---|---|---|---|
| 512 × 512 | ~8 s | ~10 s | ~15 s |
| 1024 × 1024 | ~30 s | ~40 s | ~60 s |
decomposition is slowest because it runs an iterative least-squares solve at every pixel. For batch processing, consider downscaling first or running on a subset.
- JPEG compression noise — lossy artefacts can mimic background bleed, causing holes or fringes in the output. Use PNG inputs when possible, or increase
--toleranceto 70–100. - Non-solid backgrounds — gradients, patterns, or vignettes break the assumption that "background = single solid colour."
- Subject changes between renders — if the AI changes the subject's pose, expression, or detail, those differences will be classified as edge or transparent.
- Very large images — pixel-by-pixel Python loops are slow on multi-megapixel inputs. Downscale first if speed matters.
- Distinct backgrounds required — two images with very similar backgrounds (e.g. white + light grey) produce poor results. Backgrounds must be far apart in RGB space.
The repo includes ready-to-run samples:
# Feather icon (3 backgrounds) — compare edge quality between methods
python diffmat.py samples/feather-icon2/ --method simple
python diffmat.py samples/feather-icon2/ --method variance
python diffmat.py samples/feather-icon2/ --method decomposition
# Sparkdown icon (2 backgrounds) — quick test
python diffmat.py samples/sparkdown-icon/ --method decompositionEach sample folder contains 2–5 source images. Run any of the commands above to generate the output.
Contributions are welcome.
- Fork the repo and create a feature branch.
- Test your changes against
samples/cat/(5 images) — it's the most comprehensive sample. - Open a PR with a clear description of what changed and why.
Adding a new method: implement a function with the signature _method_yourname(images, bg_colors, tol, sim_tol, ref_idx, aa) -> np.ndarray and register it in the _METHODS dict at the top of the file.
Adding samples: create a folder under samples/ with 2–5 images following the filename convention. Keep total repo size under 100 MB — use Git LFS for larger sets.
Bug reports: please include the method used, tolerance value, number of input images, and whether inputs are PNG or JPEG.
MIT. See LICENSE.
Inspired by the difference-matting approach described in Generating transparent background images with Nano Banana Pro 2.
Edward Tsang — blockchain & AI engineer. Open to consulting → Email · LinkedIn