This page is a work in progress. OIDN, as of right now, is ~7.5x slower than emulated

Project link:


OpenImageDenoise is a dependency of Blender, one that is especially useful on non-workstation machines, as it masks a lot of the raytracing noise that comes with doing fewer render passes, as a lower-powered device would do.


Working, but ~7.5x slower - ARM64 support is having to be added from scratch, and the route taken for x64 is not compatible.

The x64 route uses OneDNN, which happily compiles on WoA if you use their normal repo, after several pull requests were merged. However, the copy of OneDNN that comes with OIDN is modified, and no record of what the modifications made were, was made.

Having done some investigatory work, I was only able to enable the reference C implementation for ARM64 within OneDNN, which was a roughly 250x slowdown compared to x64. This was not acceptable, and the only path forward, according to the OIDN maintainer, was to write a custom convolution kernel in ISPC. See for more info: and

When ARM64 support is added, it will need to be added for all AArch64 devices - linux included - this means things like DirectML are not an option to do the heavy lifting for us.

There is a branch where I started this work in C++, got it working as a reference implementation, then reproduced the same result via ISPC. This is functional, but needs work on speeding it up.


To compile the current ISPC in-progress branch, get it from here:

As the submodules are path-based, it may be easier to clone the parent repo with --recurse-submodules, then add mine as another origin, then switch to that branch, rather than faffing with submodule paths, as follows:

git clone --recurse-submodules cd oidn git remote add anthony-origin git fetch anthony-origin git checkout anthony-origin/arm-ispc

You will also need the native ARM64 ISPC compiler:

And a copy of oneTBB (debug and release builds):

Then execute the following (inside the oidn dir) from a native ARM64 vcvarsall:

mkdir build cd build cmake -G "Visual Studio 17 2022" .. -DTBB_ROOT=<tbb release or debug folder as appropriate> -DISPC_EXECUTABLE=<path to ispc executable>

You should then be able to build it via the newly generated VS solution, or via

cmake --build .

NOTE: It is not possible to switch easily between debug and release builds in VS - you have to delete the build folder and start over, as you need the corresponsing tbb install dir for the config.


Compiling successfully will give you a number of binaries in the directory. A simple app that just applies denoising can be found under the name oidnDenoise.exe to run this, you need the following file


Lower-res (useful for the old CPP-based solution):

Note that these images are in little-endian “PFM” format, the only image viewer I have been able to find that opens them is the ImageMagick “IMDisplay” app. You will need to be able to open the files to compare the before and after.

Once you have those files, you can run the denoiser via the following command line:

And observe the result by opening bmwFiltered.pfm in ImageMagick.


  • There is a reference implementation in C++ a few commits back (to ensure correctness), and is in the file devices/cpu/ispc/ispc_conv.cpp - the ISPC bit came later, and is now the implementation of choice

  • The input and output tensor formats are in the format “CHW”, more details on which can be found here under “blocked layout”:

  • OIDN uses a u-net architecture - pooling for CHW tensors has been implemented alongside this, but upsampling, etc was already in

  • You may note that “n” has been omitted in all these tensor formats compared to normal - it is implied that n is never anything but 0 - the OIDN code itself uses the naming convention with no “n”

  • The weights are in OIHW format

Further Reading

These are a set of atricles that talk about how neural networks work, but interestingly the author uses OIDN as their basis, so lots of the diagrams and pseudocode are useful in understanding OIDN