Note |
---|
This page is a work in progress. OIDN, as of right now, is ~7.5x slower than emulated |
Project link: https://github.com/OpenImageDenoiseRenderKit/oidn
Motivation
OpenImageDenoise is a dependency of Blender, one that is especially useful on non-workstation machines, as it masks a lot of the raytracing noise that comes with doing fewer render passes, as a lower-powered device would do.
Status
Working, but ~7.5x slower - ARM64 support is having to be added from scratch, and the route taken for x64 is not compatible.
The x64 route uses OneDNN, which happily compiles on WoA if you use their normal repo, after several pull requests were merged. However, the copy of OneDNN that comes with OIDN is modified, and no record of what the modifications made were, was made.
Having done some investigatory work, I was only able to enable the reference C implementation for ARM64 within OneDNN, which was a roughly 250x slowdown compared to x64. This was not acceptable, and the only path forward, according to the OIDN maintainer, was to write a custom convolution kernel in ISPC. See for more info: https://github.com/OpenImageDenoise/oidn/pull/172 and https://github.com/OpenImageDenoise/oidn/issues/168
When ARM64 support is added, it will need to be added for all AArch64 devices - linux included - this means things like DirectML are not an option to do the heavy lifting for us.
There is a branch where I started this work in C++, got it working as a reference implementation, then reproduced the same result via ISPC. This is functional, but needs work on speeding it upExample (before and after of the Blender scene “Spring”, using an intentionally low sample count of 10):
...
Status
Working, can be built yourself from scratch from source.
Compiling
To compile the current ISPC in-progress branch, get it from here: https://github.com/anthony-linaro/oidn/tree/arm-ispc main branch, follow th einstructions below.
As the project uses submodules are path-based, it may be easier to clone the parent repo with --recurse-submodules, then add mine as another origin, then switch to that branch, rather than faffing with submodule paths, as follows:
Code Block |
---|
git clone --recurse-submodules https://github.com/OpenImageDenoiseRenderKit/oidn cd oidn git remote add anthony-origin https://github.com/anthony-linaro/oidn git fetch anthony-origin git checkout anthony-origin/arm-ispc |
You will also need the native ARM64 ISPC compiler: https://drive.google.com/file/d/1X_qIfjVFeSqXdQqHpDLGkpGwbB4kO-im/view?usp=sharing
...
Full-res: https://drive.google.com/file/d/1lXaGymIcz1uB7mO7bwNp8-H8mMRsfQTu/view?usp=sharingLower-res (useful for the old CPP-based solution): https://drive.google.com/file/d/1AKvplud19LKmoOj0culkAj7IMir2OcC4/view?usp=sharing
Note that these images are this image is in little-endian “PFM” format, the only image viewer I have been able to find that opens them it is the ImageMagick “IMDisplay” app. You will need to be able to open the files to compare the before and after.
Once you have those filesyour file, you can run the denoiser via the following command line:
...
And observe the result by opening bmwFiltered.pfm
in ImageMagick.
Notes
...
There is a reference implementation in C++ a few commits back (to ensure correctness), and is in the file devices/cpu/ispc/ispc_conv.cpp
- the ISPC bit came later, and is now the implementation of choice
...
The input and output tensor formats are in the format “CHW”, more details on which can be found here under “blocked layout”: https://oneapi-src.github.io/oneDNN/dev_guide_understanding_memory_formats.html#blocked-layout
...
OIDN uses a u-net architecture - pooling for CHW tensors has been implemented alongside this, but upsampling, etc was already in
...
You may note that “n” has been omitted in all these tensor formats compared to normal - it is implied that n is never anything but 0 - the OIDN code itself uses the naming convention with no “n”
...
No notes - this works OOB after an implementation by Intel themselves.
Further Reading
https://maxliani.wordpress.com/2023/03/17/dnnd-1-a-deep-neural-network-dive/
...