Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial working version of a LUA script for darktable #2

Merged
merged 1 commit into from
Jun 15, 2022

Conversation

hqhoang
Copy link
Contributor

@hqhoang hqhoang commented Jun 15, 2022

Richardson-Lucy deblur, at least for the moment, is much better than the sharpening tools in darktable (although it's quite slow). However, RL-deblur tends to amplify noise when pushed more than 10-20 iterations, even with damping (in RawTherapee). I found nind-denoise is the perfect missing piece in my workflow, it removes the noise completely, allowing me to push RL-deblur as far as I want (usually sigma=1 and iterations=20).

Thus, I combine nind-denoise and RL-deblur into one script, users can enable each tool individually. Based on the original RL_out_sharp.lua:

darktable-org/lua-scripts#385

@trougnouf trougnouf merged commit 4c40b98 into trougnouf:master Jun 15, 2022
@trougnouf
Copy link
Owner

This is great! :) Thank you very much.

@hqhoang
Copy link
Contributor Author

hqhoang commented Jun 15, 2022

Taking this chance to say thank you for this wonderful piece of denoiser, it's an important missing piece in the FOSS space.

I'll have to look into adding photos to the dataset, there're a few cases that can be improved. I have so much to learn regarding training, I'll start a discussion on pixls.us.

@trougnouf
Copy link
Owner

My pleasure! :) I hope it can be useful and I am very happy that it is being used.

I would very much appreciate your contributions. If you do, can you share the raw files as well? I am currently working on a version which could handle Bayer or just-demosaiced images, I expect it would generalize better since there won't be any dependency on the processing style (but it might also suffer from different sensor specificities).

An focused discussion on the topic would be great. Below are a few relevant discussions:
https://discuss.pixls.us/t/exporting-importing-photos-in-full-32-bit-representation-to-train-a-neural-network-for-image-denoising/18372
https://discuss.pixls.us/t/denoising-based-on-ai/27818
https://discuss.pixls.us/t/trying-to-convert-raw-images-to-xyz-or-linear-rec-2020-with-only-demosaic-and-d65-white-balance-applied-in-python-results-do-not-match-raw-processors/30491

@hqhoang
Copy link
Contributor Author

hqhoang commented Jun 15, 2022

Is there a central repository where we host the sample RAWs? Or, at least a central list that keeps track of the links (assuming hosting storage is too expensive)?

I have the Fuji X-T2/X-T20. For Bayer, I could also submit samples from my old Nikon D5100 and D5500, but they might be too outdated, better than nothing, I guess. Exposure/SS bracketing on a tripod should work?

I see you're trying to denoise the RAW just after demosaicing, it's good that there're only a handful of popular demosaic algos for both Bayer and X-Trans. Do you need to train Bayer and X-Trans separately? For darktable, I currently save the sharpening and profiled denoise in a style that only applies on export, so that they don't slow down my interaction in the dark room. I could see your denoise being used the same way, it just needs to be slotted right after demosaic on export (which is not possible with the way a style is appended to history stack on export currently).

@trougnouf
Copy link
Owner

It's not published yet :s I will publish the raw dataset (which is part of the paper I'm working on) once I find a way to host it. I contacted UCLouvain (university) today about hosting it on their "dataverse" ( https://dataverse.uclouvain.be/dataverse/elen ). In the meantime I think they can be posted on pixls.us and I can add all contributions (as they become available) to the hosting once there is one in place.

I think any raw sensor will help with generalization. Exposure bracketting would be ideal. I plan on writing a script that works with USB controllable cameras that don't have it once I start working on the dataset again (once the training code is stable). For the current version of the dataset I mostly used the camera's wireless Android app but one old camera had me press physical button (I then needed to do some software alignment and there are more discarded pictures).

I'm working on two different ways with their own trade-offs; (1) debayer+denoise in the same network (from raw files), in this case I think I will stick to Bayer which is relatively simple (though I had to align all the different patterns to RGGB) but an XTrans-specific network is possible/ has been done in the literature and contributions are welcome, and (2) denoise-only from OpenEXR demosaiced pictures. My current code generates the input OpenEXR from raw with Bayer images but I used darktable to generate the XTrans raw input (just exported with daylight WB and Lin. Rec.2020, maybe that can be done from the CLI? That would be useful to transfer them to university's training cluster). Both export to OpenEXR.

Unfortunately I cannot share the new source code I am working on for raw images yet :( I am now doing it as part of my PhD which is a doctorate in enterprise and the code has to stay closed until I make a publication (which is probably 6-12 months away). The company has historically been good about letting me open-source my work since it's not directly related to their products so I am not worried about it going to waste, it's just a matter of time. Publishing the dataset isn't an issue since it's partly crowd sourced and I made most of it as part of my master thesis.

The use-case I was thinking of would have darktable work from denoised OpenEXR pictures as a constant pre-processing step (or ideally become a darktable module, eventually), I'm not sure how that works with styles and Lua scripting but it might be possible to work on the raw picture normally and have an export script do this export-reimport and apply the same processing to the denoised OpenEXR version (?)

@hqhoang
Copy link
Contributor Author

hqhoang commented Jun 16, 2022

Sounds like a plan in terms of hosting the dataset. I'll share on GDrive for now.

I think any raw sensor will help with generalization.

I'm not familiar with neural-network, but I guess we don't know until we try. If generalization compromises performance/accuracy, we can try again with separate models. Just like demosaic in darktable, different algos (e.g. AMaZE vs Markesteijn) are shown depending on sensor type, we can present different denoise models accordingly.

I plan on writing a script that works with USB controllable cameras that don't have it once I start working on the dataset again (once the training code is stable). For the current version of the dataset I mostly used the camera's wireless Android app but one old camera had me press physical button

gphoto2 works pretty well for USB-controlled operations. The sensor will get hot if it stays on for too long, resulting in lots of hot pixels, will need to give it proper cooling or rest in between.

(I then needed to do some software alignment and there are more discarded pictures).

I used to use align_image_stack and cpfind for median blending:
https://www.dpreview.com/forums/thread/4540304

I recently rewrote macrofusion, and learned about OpenCV's findTransformECC in the process. It seems to be a more modern approach, much more accurate, and likely can be GPU-accelerated, too.

https://discuss.pixls.us/t/macrofusion-rewritten-in-tkinter/30118
https://github.com/PetteriAimonen/focus-stack/

(2) denoise-only from OpenEXR demosaiced pictures. My current code generates the input OpenEXR from raw with Bayer images but I used darktable to generate the XTrans raw input (just exported with daylight WB and Lin. Rec.2020, maybe that can be done from the CLI? That would be useful to transfer them to university's training cluster). Both export to OpenEXR.

I'm not an expert at darktable, but I think you can use darktable-cli to export with a specific XMP or style, just need to have the "color calibration" module with Daylight in the stack.

Does exposure affect the training? When I shoot theater performances, I cap AutoISO at 800 as the X-T2 sensor is ISO-invariant, and to prevent accidental clipping from bright spotlights. I usually push 3-5EV in post, thus, the raw itself at default exposure will be really dark. An ISO800 RAW in this case is equivalent to ISO6400 or ISO12800 after applying the "exposure" module, but nind-denoise has no problem with it. An example:

https://www.dpreview.com/forums/post/66227338

The use-case I was thinking of would have darktable work from denoised OpenEXR pictures as a constant pre-processing step (or ideally become a darktable module, eventually), I'm not sure how that works with styles and Lua scripting but it might be possible to work on the raw picture normally and have an export script do this export-reimport and apply the same processing to the denoised OpenEXR version (?)

You may have to check with the darktable devs/experts on pixls.us, but from how I understand, the whole history stack is re-applied/re-run everytime a change is added. The darkroom is supposed to be interactive, any computation that cause noticeable delay/lag for users won't get in there, e.g. we leave the intensive operations in a style to apply only on export, not while using the darkroom. Richardson-Lucy deconvolution is an example, it's too slow to be used in the darkroom (and thus we work around by having it at the end of the export with a LUA script).

Export-reimport will not be ideal then, the OpenEXR intermediate files will add extra storage, and need to be regenerated when demosaic or WB changes (and even exposure?). This requires fundamental changes to darktable codebase, will be harder for you to get buy-in from the devs. Even if we decide to cache a denoised intermediate files, which format to use is also a question (is there more suitable format than OpenEXR?). But IMO, darktable devs should start thinking about this now, as more time-consuming tools will be available, having intermediate file/cache is unavoidable. Just look at the clunky workflow of juggling intermediate DNG/TIFF files between Iridient Transformer, Topaz Denoise, DeepPrime, and Lightroom, .... :-)

You may want to start the conversation (on integrating with darktable) early on pixls.us, as that will shape your strategy and approach. If intermediate cache/file not feasible, a compromise might be needed: e.g. having nind-denoise as a darktable module that doesn't run in darkroom. Instead, if added to the history stack, it'll be slotted at the appropriate spot, e.g. after demosaic, color-calibration (and exposure?), then applied only on export. Users just have to accept that they need to export in order to see the fully denoised image.

@hqhoang hqhoang deleted the darktable_lua_script branch June 16, 2022 16:53
@trougnouf
Copy link
Owner

trougnouf commented Jun 24, 2022

Sounds like a plan in terms of hosting the dataset. I'll share on GDrive for now.

Perfect :)

I've set-up an account on the UCLouvain dataverse ( https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/DEQCIM ), once all the files are in order I will request that it's made public.

gphoto2 works pretty well for USB-controlled operations. The sensor will get hot if it stays on for too long, resulting in lots of hot pixels, will need to give it proper cooling or rest in between.

Indeed I planned on using gphoto2 after playing around with it to get focus stacking on a camera that doesn't have it. Good point about the sensor getting hot, I hadn't considered it.

I used to use align_image_stack and cpfind for median blending:
https://www.dpreview.com/forums/thread/4540304

I used that in the original NIND, but it can't generate raw files so I made my own function and just save the # of pixels shifted with the best alignment. A more modern approach would probably be better to handle differences in depth, I can't promise I will get to it but it would definitely help, thanks for the links.

I'm not an expert at darktable, but I think you can use darktable-cli to export with a specific XMP or style, just need to have the "color calibration" module with Daylight in the stack.

Yes that should work well :) I would leave "color calibration" for later and only apply daylight whitebalance (which is the input that color calibration receives) to avoid anything potentially subjective.

Does exposure affect the training? When I shoot theater performances, I cap AutoISO at 800 as the X-T2 sensor is ISO-invariant, and to prevent accidental clipping from bright spotlights. I usually push 3-5EV in post, thus, the raw itself at default exposure will be really dark. An ISO800 RAW in this case is equivalent to ISO6400 or ISO12800 after applying the "exposure" module, but nind-denoise has no problem with it. An example:

Exposure should affect the training to some extant (so it would be best to include different exposures in the ground-truth), but I'm glad to read that denoising low-exposure photos still works well :) A lot of the training images were captured with low exposure (because training went farther than the camera's max ISO) so the same process can be replicated by increasing the exposure before denoising.

In the future rawNIND training it shouldn't matter because I am now matching the exposure after the network has processed the image.

You may have to check with the darktable devs/experts on pixls.us, but from how I understand, the whole history stack is re-applied/re-run everytime a change is added. The darkroom is supposed to be interactive, any computation that cause noticeable delay/lag for users won't get in there, e.g. we leave the intensive operations in a style to apply only on export, not while using the darkroom. Richardson-Lucy deconvolution is an example, it's too slow to be used in the darkroom (and thus we work around by having it at the end of the export with a LUA script).

Indeed it would be best to communicate more. I have communicated a bit (especially on IRC) mostly around technical issues and best input format. So far I have the feeling that interest is limited, partly because it's black magic until it's proven to work best, and mainly because computational complexity is too high. I'm pretty confident that a network could be trained with a complexity at most 1/10 that of the current U-Net and still readily beat the denoising algorithms available in darktable, so I think it would be reasonable to provide multiple networks / implementations with different level of performance / complexity. I also think that the demosaic/denoising algorithm complexity is less important because it sits at the beginning of the pipeline and (if I understand correctly) it's only applied once. (bonus: blind denoising networks don't have any parameters to tune/explore).

Export-reimport will not be ideal then, the OpenEXR intermediate files will add extra storage, and need to be regenerated when demosaic or WB changes (and even exposure?). This requires fundamental changes to darktable codebase, will be harder for you to get buy-in from the devs. Even if we decide to cache a denoised intermediate files, which format to use is also a question (is there more suitable format than OpenEXR?). But IMO, darktable devs should start thinking about this now, as more time-consuming tools will be available, having intermediate file/cache is unavoidable. Just look at the clunky workflow of juggling intermediate DNG/TIFF files between Iridient Transformer, Topaz Denoise, DeepPrime, and Lightroom, .... :-)

In my experience OpenEXR is best (at least among what's available with OpenCV). It saves everything as 32-bit float which is perfect, and I can embed the Lin. Rec. 2020 color information. Only cons is it doesn't save metadata. In theory tiff could do all of that, but it's a complex format with many purposes and it's impossible to export 32-bit unbounded tiff with OpenCV.

I don't think any such re-generation would be required; everything is based on "daylight" white balance since that's the input of color calibration, and either the network acts as a demosaic algorithm too, or it should use the same demosaic algorithm used in training (currently what's available in OpenCV but I should probably use a darktable xmp template to train a network specialized in darktable integration). Exposure would be performed after demosaic+denoising.

Still there are indeed different ways to proceed and a clear discussion with the darktable (and potentially vkdt) developers is needed.

@hqhoang
Copy link
Contributor Author

hqhoang commented Jun 27, 2022

I guess you're rethinking your redirection/strategy, so I'm not creating samples for now. Once you finalize your decision, it'd be best if you can start a guideline documentation detailing what you need for samples, e.g. which ISO to shoot, if under/over-exposed,day/night or dark/bright scenes, any specific subjects (landscape, bird, indoor room, material, ...), ... I'll be happy to be your guinea pig.

Do we need to balance the samples, e.g. dark vs bright, indoor vs outdoor, detailed vs smooth surfaces, mid vs high ISO, ....? Maybe we need to think ahead to categorize, tag, and track samples, e.g. <indoor, ISO 3200, f/2.8, ...> so that we know how many samples in each category/tag, and how much else we need so that contributors/volunteers can focus on the missing ones. This will be necessary once the data set is large enough. These probably can be automated with a pre-trained classifier in combination with exiftool, so maybe the storage doesn't have to be complicated, we can recalculate the stats on addition of new images.

For darktable, currently it reruns the history stack fairly often. If you run " darktable -d perf", you can see the operations being called repeatedly in the CLI/terminal output with every changes. I'm not sure if there's any caching mechanism in place at the moment, but if not, that's certainly a good optimization opportunity, whether with nind-denoise or not. That's up to your conversation with the darktable devs, I guess.

@trougnouf
Copy link
Owner

trougnouf commented Jun 28, 2022

Thanks :) The guidelines for content creation have not and will not change (as I expect most of the dataset will be made of the pre-existing raw files from NIND), only the way they are processed in training might differ (and there is no longer any processing involved with the data acquisition).

That is, capture a static scene from a static point with different ISO values (one or more base ISO images, three or more higher ISO images), fixed focus, fixed aperture.

The biggest restriction is that the scene must be fixed: no movement in the scene or from the camera.
* avoid outdoor scenes in which small objects can move with the wind
* ensure the lighting is consistent, cloud movements can impact indoor scenes too.

Files should be licensed as CC0 (though any CC license should be acceptable), they are organized as follow:
<scene_name>/<noisy_image_filename> and <scene_name>/gt/<ground_truth_image_filename>


This means living bird shots are impossible, outdoor scenes are tricky but possible (best if there are no clouds).

Any such variation will result in learning to blur (movement; the best way to generate something that's between two different things is to average them aka blur) or to generate false colors (from lighting differences).

Other than that the more variety the better. This includes different camera sensors (even if the content is the same), contents, ISO values (ground-truth(s) + at least 3 higher ISOs), aperture (I think it would be best to limit it to f/11 or wider), exposure.

I think it's too early to go that far into categorization (though I would gladly accept and publish such contribution); just getting any variation with different cameras, participants, styles, settings, and subjects goes a long way. If the dataset gets very large and/or one would want to train a specialized network then yes tagging would be very helpful.

It might be desirable to limit ISO to a reasonable level but that's a training time decision. The ISO value itself is pretty meaningless because it depends on the camera and content, so I analyze the dataset as part of pre-processing prior to training and quantify the quality of each image.
In the original dataset overexposure in the ground-truth was also a big concern because we didn't want the model to be trained to generate overexposure, but now it just gets masked out (based on raw white point in any channel). I also try to mask differences between scenes (movement) but this is very limited (because at some point the noise gets masked too so a threshold is set based on how much noise the model will handle and some movement is missed which results in learning to blur).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants