Skip to content

[AI Subsystem] AI object mask and AI denoising#20322

Open
andriiryzhkov wants to merge 52 commits intodarktable-org:masterfrom
andriiryzhkov:object_ai_mask
Open

[AI Subsystem] AI object mask and AI denoising#20322
andriiryzhkov wants to merge 52 commits intodarktable-org:masterfrom
andriiryzhkov:object_ai_mask

Conversation

@andriiryzhkov
Copy link
Contributor

@andriiryzhkov andriiryzhkov commented Feb 11, 2026

This PR introduces an AI subsystem into darktable with two features built on top of it:

  1. AI Object Mask — a new mask tool that lets users select objects in the image by clicking on them. It uses the Light HQ-SAM model to segment objects, then automatically vectorizes the result into path masks (using ras2vect) that integrate with darktable's existing mask system.

  2. AI Denoise — a denoising module powered by the NAFNet model. This was initially developed as a simpler test case for the AI subsystem and is included here as a bonus feature.

Both models are converted to ONNX format for inference. Conversion scripts live in a separate repository: https://github.com/andriiryzhkov/darktable-ai. Models are not bundled with darktable — they are downloaded from GitHub Releases after the app is installed, with SHA256 verification. A new dependency on libarchive is added to handle extracting the downloaded model archives.

AI subsystem design

The AI subsystem is currently built on top of ONNX Runtime, though the backend is abstracted to allow adding other inference engines in the future. ONNX Runtime is used from pre-built packages distributed on GitHub. On Windows, ONNX Runtime is built with MSVC, so using pre-built binaries is the natural approach for us — I initially expected this to be a problem, but discovered this is common practice among other open-source projects and works well.

The system is organized in three layers:

  1. Backend (src/ai/): Wraps ONNX Runtime C API behind opaque handles. Handles session creation, tensor I/O, float16 conversion, and hardware acceleration provider selection (CoreML, CUDA, ROCm, DirectML). Providers are enabled via runtime dynamic symbol lookup rather than compile-time linking, so there are no build dependencies on vendor-specific libraries. A separate segmentation.c implements the SAM two-stage encoder/decoder pipeline with embedding caching and iterative mask refinement.

  2. Model management (src/common/ai_models.c): Registry that tracks available models, their download status, and user preferences. Downloads model packages from GitHub Releases with SHA256 verification, path traversal protection, and version-aware tag matching. Uses libarchive for safe extraction with symlink and dotdot protections. Thread-safe — all public getters return struct copies, not pointers into the registry.

  3. UI and modules: The object mask tool (src/develop/masks/object.c) runs SAM encoding in a background thread to keep the UI responsive. The user sees a "working..." overlay during encoding, then clicks to place foreground/background prompts. Right-click finalizes by vectorizing the raster mask into Bézier path forms. AI denoise module (src/libs/denoise_ai.c) and preferences tab (src/gui/preferences_ai.c) provide the remaining user-facing features.

Fixes: #12295, #19078, #19310

@andriiryzhkov andriiryzhkov mentioned this pull request Feb 11, 2026
@TurboGit
Copy link
Member

Models are not bundled with darktable —

Perfect!

they are downloaded from GitHub Releases after the app is installed, with SHA256 verification. A new dependency on libarchive is added to handle extracting the downloaded model archives.

Can this be simplified (no SHA256) for now to allow testers to download the models using current master?

@andriiryzhkov
Copy link
Contributor Author

andriiryzhkov commented Feb 11, 2026

For testing purposes, you can skip the download mechanism entirely and just manually place model files in ~/.local/share/darktable/models/. Each model is a directory with a config.json and ONNX files — the AI backend scans that path at startup.

If placed manually, there are no SHA256 checks.

@victoryforce
Copy link
Collaborator

@andriiryzhkov Thank you for such a great contribution!

For macOS CI to complete successfully, libarchive should be added to .ci/Brewfile.

@andriiryzhkov
Copy link
Contributor Author

@victoryforce thank you for the advise! Done.

@andriiryzhkov
Copy link
Contributor Author

@MikoMikarro I am glad you are in the game!

We can add OpenCV DNN as another backend provider. It has a bit limited support of neural network operators, but still usable for some models.

@andriiryzhkov
Copy link
Contributor Author

If by any mean you have such model I'll test it to see if the object are better recognized.

@TurboGit You can download HQ-SAM B model from here https://github.com/andriiryzhkov/darktable-ai/releases/download/5.5.0.4/mask-hq-sam-b.zip.

Unpack it to ~/.local/share/darktable/models/ and in darktablerc update parameter plugins/darkroom/masks/object/model=mask-hq-sam-b. This will enable HQ-SAM B variant of the model.

@TurboGit
Copy link
Member

@andriiryzhkov : Ok I've tried this new model and still not impressed :) We can call me a standard user on this as I have never used IA for this in any software. My expectations was maybe a bit high... But seeing some demo on the SAM model I was expecting quite a better object segmentation. Here are some examples, each time I did a single click on something on the image:

  1. unexpected mask on squirrel, head of penguin
image
  1. impossible to select the color palette
image
  1. shirt not fully selected, some part of water and something on the object in the hand
image
  1. bottle on the left partly selected, unexpected parts on the bottle sticker selected
image
  1. unexpected part under the cup selected
image
  1. napkin only partly selected
image

I can continue... In fact I haven't seen a case where it was good. This new model is a bit better than the light version but far from the quality expected for integration.

Let's me ask a question. Do you have some test cases where it works perfect?

@wpferguson
Copy link
Member

The model registry

Is this provided by darktable or is this an external entity.?

The reason I ask is, taking my example above, I merge the ext_editors script and then provide a drop down listing all the Adobe products that it will run. I can't provide that list otherwise darktable is "endorsing" Adobe's software.

Another reason is darktable assumes liability for recommending the model. Better to provide the user a list of places that have model collections and let them make the choice.

@TurboGit
Copy link
Member

TurboGit commented Feb 12, 2026

ref: #20322 (comment)

I don't want to be made fun of, so here is the full sentence of what I said to paperdigits on matrix:

Well the messages on the PR are definitely voiced strongly against the PR. I encourage you to voice your point here or anywhere else (I'm also voicing against AI with my friends) but I try to keep open minded on work done for a whole community. That's my point. We can't agree with everything integrated on Darktable, that's my case too, but I'm no one to trash something because I don't like it.
Said another way, I'm not Darktable, you are not Darktable... The projet is bigger than us.

Or maybe I'm a fool...

@wpferguson
Copy link
Member

Or maybe I'm a fool...

AI is a POLARIZING subject right now. Some people love it and some people hate or fear it, and there are STRONG opinions on both sides.

My thoughts

  • If this gets merged I will probably not use it because it's too slow for my workflow.
  • My original thought about how to get AI into darktable was darktable providing "hooks" to get data in and out and using Lua to interface to external modules. That seems to have been overtaken by events 😄
  • The ONNX runtime is fine. I don't think anyone has a problem with that except for windows integration. If we can't bundle it I have some ideas.
  • The models seem to be the controversial part of the issue, so I think we offload that onto the user and let the user choose the models they are comfortable with.
  • We could ask hanatos his views on AI, since he's the "father" of darktable.

@TurboGit you are doing fine. You make the best choice you can based on the information available to you. If you don't have enough information then open an RFC issue, though with maybe some guidelines like:

  • you can make 1 comment which is your opinion (so we don't end up in a flame war)
  • you can't reference another comment
  • you can upvote or downvote other comments.,

@TurboGit
Copy link
Member

If this gets merged I will probably not use it because it's too slow for my workflow.

Yes, the latest model is a bit slow the light version was almost instantaneous on my side. But my main concern now is the quality of the segmentation. At this stage it is not helping users at all, maybe the training needs to be tweaked... I don't know and I know next to nothing about IA so I let the experts discuss this point.

My original thought about how to get AI into darktable was darktable providing "hooks" to get data in and out and using Lua to interface to external modules.

That would work good for AI denoise but for masking we need fast UI interaction to display the mask add or remove to it. Would that work with Lua?

The models seem to be the controversial part of the issue, so I think we offload that onto the user and let the user choose the models they are comfortable with.

I can understand that, that's why the models are not and will never be distributed with Darktable. Also the AI feature is not activated by default.

@TurboGit you are doing fine.

Thanks, I've been maintainer for 7 years now, maybe that's too much for a single man :)

I fear that the RFC or poll will be a place of fight :) On such hot topic I think we should discuss with the core developers and find a way forward (or not).

@wpferguson
Copy link
Member

Would that work with Lua?

Only if the AI script would support it and could create the display and interaction.

I fear that the RFC or poll will be a place of fight :)

That was my thought too, which was why I added all the conditions. I could definitely see that not ending well.

Thanks, I've been maintainer for 7 years now, maybe that's too much for a single man :)

I once interviewed for a job and they asked me what I wanted to do. My answer was "Let me tell you what I don't want to do. I don't want to be the boss, I don't want to be in charge. I just want to work". I feel your pain.

I think you've done an incredible job of "herding cats". Dealing with lots of personalities, language barriers, users, developers, issues, and still keeping everything on track is quite an accomplishment. It's also a LOT of work for one person. Should we look at some way to share the load or delegate some tasks?

@elstoc
Copy link
Contributor

elstoc commented Feb 12, 2026

This will be virtually impossible to document (and I'm not sure I'm willing to even merge such documentation) without mentioning or seeming to recommend specific models.

As I said in the IRC chat, if we could provide something extremely generic to allow certain operations to be handed off to another "external program" (like lua but probably needs to be more integrated into the pixelpipe) that'd be fine with me (i.e. not explicit AI integration).

If we wanted to source our own open dataset and volunteers to train a model of our own that would also be fine with me (though I'm still slightly uncomfortable about the environmental impacts of AI, at least the licensing and "data sweatshops" concerns would be alleviated).

But it's really really hard to source good reliable and verifiable information about how most of these models have been trained (both from a data and a human point-of-view) and AI is such a divisive issue there's a good chance of a proper split in the community here, and difficult decisions being made by package maintainers.

I for one will have to decide whether I'm comfortable enough with this to continue contributing to the project.

@AyedaOk
Copy link
Contributor

AyedaOk commented Feb 13, 2026

@andriiryzhkov When I saw this PR, I stopped my work to test it. I can't comment on the implications of merging it into Darktable or on its ethical aspects. However, I've been using SAM2 and SAM3 over the last few months with Lua plugins and the external raster mask module. I am sharing my feedback based on my experience with the Lua approach, hoping it will be helpful.

Like @TurboGit said, the quality of the segmentation is not good when using HQ-SAM B. This surprised me because I got ok results with SAM2.1 (even with the tiny version). I downloaded a larger HQ model from Hugging Face (sam_hq_vit_l.pth) and converted it to onnx with your script, but got the same result. Next, without converting it, I tried to run sam_hq_vit_l.pth with python. Same results.

Based on that, I believe the HQ models do not perform as well as the original SAM2.1. I’ve added an example below so you can compare. Is there a way to use SAM2.1 instead of the HQ models? If so, this might fix the mask quality problem. I tried converting sam2.1_hiera_base_plus.pt to onnx, but it didn’t work. I've also tried this but it failed as well.

I haven’t tried denoise yet, but I’ll test it soon. I don’t like NAFNet, but I’ll try to load NIND or RawForge.

Lastly, although I think this PR is great, it think it is still worth improving the Lua integration of the raster mask module. With Lua, it doesn’t matter whether you use onnx or PyTorch and you can customize the script to your preferences.

HQ-SAM Large:
image

SAM2.1_hiera_large:
image

@andriiryzhkov
Copy link
Contributor Author

@TurboGit, @AyedaOk thank you for the feedback!

Just a quick update on this — I'm actively working on extending support to additional SAM model variants (including SAM 2.1) and refining some of the post-processing algorithms.

Expect to push updates next week. Happy to discuss any specific requirements or edge cases you'd like me to prioritize in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AI Masks