Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Note

This page is a work in progress. OIDN, as of right now, is ~7.5x slower than emulated

Project link: https://github.com/OpenImageDenoiseRenderKit/oidn

Motivation

OpenImageDenoise is a dependency of Blender, one that is especially useful on non-workstation machines, as it masks a lot of the raytracing noise that comes with doing fewer render passes, as a lower-powered device would do.

Status

Working, but ~7.5x slower - ARM64 support is having to be added from scratch, and the route taken for x64 is not compatible.

The x64 route uses OneDNN, which happily compiles on WoA if you use their normal repo, after several pull requests were merged. However, the copy of OneDNN that comes with OIDN is modified, and no record of what the modifications made were, was made.

Having done some investigatory work, I was only able to enable the reference C implementation for ARM64 within OneDNN, which was a roughly 250x slowdown compared to x64. This was not acceptable, and the only path forward, according to the OIDN maintainer, was to write a custom convolution kernel in ISPC. See for more info: https://github.com/OpenImageDenoise/oidn/pull/172 and https://github.com/OpenImageDenoise/oidn/issues/168

When ARM64 support is added, it will need to be added for all AArch64 devices - linux included - this means things like DirectML are not an option to do the heavy lifting for us.

There is a branch where I started this work in C++, got it working as a reference implementation, then reproduced the same result via ISPC. This is functional, but needs work on speeding it upExample (before and after of the Blender scene “Spring”, using an intentionally low sample count of 10):

...

Status

Working, can be built yourself from scratch from source.

Compiling

To compile the current ISPC in-progress branch, get it from here: https://github.com/anthony-linaro/oidn/tree/arm-ispc main branch, follow th einstructions below.

As the project uses submodules are path-based, it may be easier to clone the parent repo with --recurse-submodules, then add mine as another origin, then switch to that branch, rather than faffing with submodule paths, as follows:

Code Block
git clone --recurse-submodules https://github.com/OpenImageDenoiseRenderKit/oidn
cd oidn
git remote add anthony-origin https://github.com/anthony-linaro/oidn
git fetch anthony-origin
git checkout anthony-origin/arm-ispc

You will also need the native ARM64 ISPC compiler: https://drive.google.com/file/d/1X_qIfjVFeSqXdQqHpDLGkpGwbB4kO-im/view?usp=sharing

...

Full-res: https://drive.google.com/file/d/1lXaGymIcz1uB7mO7bwNp8-H8mMRsfQTu/view?usp=sharingLower-res (useful for the old CPP-based solution): https://drive.google.com/file/d/1AKvplud19LKmoOj0culkAj7IMir2OcC4/view?usp=sharing

Note that these images are this image is in little-endian “PFM” format, the only image viewer I have been able to find that opens them it is the ImageMagick “IMDisplay” app. You will need to be able to open the files to compare the before and after.

Once you have those filesyour file, you can run the denoiser via the following command line:

...

And observe the result by opening bmwFiltered.pfm in ImageMagick.

Notes

...

There is a reference implementation in C++ a few commits back (to ensure correctness), and is in the file devices/cpu/ispc/ispc_conv.cpp - the ISPC bit came later, and is now the implementation of choice

...

The input and output tensor formats are in the format “CHW”, more details on which can be found here under “blocked layout”: https://oneapi-src.github.io/oneDNN/dev_guide_understanding_memory_formats.html#blocked-layout

...

OIDN uses a u-net architecture - pooling for CHW tensors has been implemented alongside this, but upsampling, etc was already in

...

You may note that “n” has been omitted in all these tensor formats compared to normal - it is implied that n is never anything but 0 - the OIDN code itself uses the naming convention with no “n”

...

No notes - this works OOB after an implementation by Intel themselves.

Further Reading

https://maxliani.wordpress.com/2023/03/17/dnnd-1-a-deep-neural-network-dive/

...