/
Building TensorFlow on AArch64

Building TensorFlow on AArch64

Recipes

Please note : These are a work in progress and subject to change, although those changes will not be introduced at the expense of functionality

The Linaro HPC SIG CI loop will make use of the Ansible (and the associated Jenkins Job), but on the other hand development efforts will make use of a bash script.
The Ansible role is part of the HPC lab's Jenkins infrastructure, thus can be found nestled in the hpc_lab_jenkins repository. It can be found exactly here : https://github.com/Linaro/hpc_lab_setup/tree/tensorflowci/files/ansible
Concerning the bash script, it can be found here (pending Linaro hosting).

In the following sections we will mostly focus on the Ansible as it is the production environment, but the bash should also be functional (but more ugly):

Outline of the structure of the Ansible:

Quick note : To look at how our Jenkins executes the full job, see : https://github.com/Linaro/hpc_lab_setup/blob/tensorflowci/files/build_tensorflow.yml

install_python3.yml


This playbook is to be run first, and makes sure the environment contains all the "base" python (2 & 3) dependencies (i.e. python-devel, python-setuptools, python-pip). Python 3 is sourced from EPEL.
This playbook installs python3 requirements. It is kept separate at the moment since the second playbook should make use of the python3 interpreter (on the target/builder machine).
Sadly, the yum module only works with Ansible on python2, dnf as to be used on Ansible with python3. And CentOS7 does not what would be "python3(6)-dnf", and building dnf from source doesn't provide it either (and upgrading dnf has many dependencies).
But CentOS8 could make use of it, and apt based distros too.

build_tensorflow.yml


This playbook, as the name entails, does the rest of the job of building TensorFlow. By default (see roles/tensorflow/defaults/main.yml), it builds :
 - TensorFlow 1.15.0 Release
 - NumPy 1.17.3 Release
 - Bazel 0.24.1-dist Release


This playbook does also setup a "builder" user (see roles/tensorflow/defaults/main.yml for name) and adds it to the wheel group (that is also changed to do password-less sudo) and changes its bashrc so Lmod is systematically loaded, so it can be used to do the build.
To that effect, it fetches dependencies first (see roles/tensorflow/tasks/main.yml for the precise task order), including OpenBLAS, HDF5, FFTW, GCC 8.3.0 and LMod from OpenHPC (See the Further Enhancements section), as well as openjdk8 for Bazel.
On the topic of Lmod, we are working on trying to find a way to install it and get it to work in the same bash script (without having to restart the bash session). BASH_ENV might be the trick.


Then, it builds the v(irtual)env that will contain the pip dependencies, populates it with the first few that do not depend on numpy (i.e. pip, wheel, Cython, mock, future...) as well as keras (versions as instructed upstream)


We then proceed with the Bazel build, which is a straightforward affair.
Here we buld Bazel 0.24.1, the lower requirement to build TensorFlow. TensorFlow is quite picky about its Bazel... Thankfully, you could pretty much replace bazel_version (and bazel_url) with any available version (above 0.24.1) and it builds just fine.


After Bazel is built, we can go on with the NumPy build. The NumPy build does involve applying a workaround to address a GCC bug , pending OpenHPC picking up an up-to-date version of GCC 8.XX or fetching and using a GCC 9.X AArch64 build of the toolchain (see Further Enhancements).
Please do not that the aforementioned GCC bug breaks "pip install numpy>=1.15.3", and the workaround disables any optimizations on a function (reportedly breaks a testsuite test as well, see upstream issue)
NumPy also makes use of a certain mechanism to hook up to BLAS/LAPACK and FFTW (also UMFPACK and AMD (nothing to do with the Advanced Micro Devices) libraries, see Further Enhancements) : the .numpy-site.cfg file, to be found at /home/$USER/.numpy-site.cfg (ugly, yes...).
The "setup.py bdist_wheel" command can be a bit finicky, when in a venv, just run setup.py directly without first calling the interpreter (and make sure wheel is installed)

Once NumPy is built, we can get to the nitty-gritty : building TensorFlow itself. Here, the environment that is setup up to this point, by both the Ansible and the bash, can build 1.15.0 and 2.0.0 TensorFlow just fine (but tensorflow-benchmarks hasn't been made to work with TensorFlow 2... And the split up of tf.collab is non trivial...).
Configuration of the TensorFlow build has to be done with sourcing environment variables... Which is achieved through a script to maintain some sanity.
The TensorFlow does also require a patch, which is due to a missing requirement in Bazel's build configuration (i.e. WORKSPACE) : it is a quite well known issue.
Then the build itself is where we feed the compiler arguments (see /roles/tensorflow/defaults/main.yml : tensorflow.c_optimizations and tensorflow.c_optimizations)
Once it is built (and the pip package is built as well), the final trick is to install h5py (HDF5 python lib/API) via the command line (shell module in Ansible), with the variable HDF5_DIR pointing to the base of the HDF5 installation. (this might be a problem with LMOD, needs further investigation)
TensorFlow is built and ready to install !

After this and as a final step, the playbook will execute a "Hello World" script to make sure the TensorFlow installation is functional, this script is ripped out of the official documentation 101 : https://www.tensorflow.org/tutorials/quickstart/beginner

Further Notes


The HPC SIG's scripts to build Tensorflow support only CentOS 7 at the moment of writing this article.
Efforts will be made to ensure the compatibility with CentOS 8, and then we will look into Debian environments (probably after adding the OpenBLAS and FFTW build)

The scripts/ansible recipe does make use of LMod to keep track of the libraries, and OpenHPC's binaries for OpenBLAS, FFTW, GCC 8.3.0 and LMod.
Tensorflow and NumPy 1.17.3 do require Python 3, and the EOL of Python 2 is fast approaching, but sadly, the dnf/yum module in Ansible seems very dodgy at the moment, so we still need python2, at least for the ansible.

The Ansible makes use of venv (a.k.a virtualenv) to keep (at least) the python dependencies contained and easily identifiable.

GCC is used to build all necessary pieces (at the moment, 8.3.0 from OpenHPC)

Further Enhancements

Note: The TensorFlow Ansible and Jenkins Job is on a PR at the moment, to be merged with master hpc_lab_jenkins and put into production, please add any issue encountered or comments here : https://github.com/Linaro/hpc_lab_setup/pull/91

The first domain of focus for enhancement would be the NumPy build ; allowing optimizations, looking at fetching the UMFPACK/AMD libraries to hook it up to.
Also a round of clean up is due, to make sure that you can turn on and off certain components build, and fetch the components not built via pip/yum install.
Adding benchmark runs/testsuite runs of the stack's components is also a required step.
Then we will focus on adding more parts of the stack to the build :


                                                                                             

                                                                                                                                          fig 1 - Stack and Tools Diagram


The above diagram (fig 1) establishes the stack and tools, as well as the ones we build at the moment with the recipes (orange/salmon), the ones that would be interesting to build (green), as well as the ones that are probably not (grey)
Typically, Keras is pure Python and should not impact performance as it is only a high level model modelling library that fits on top of TensorFlow. Linux is also outside of the scope of this endeavour.
HDF5 is the filesystem used by TensorFlow, since it is a filesystem, it lays outside of the core area of optimization and requires additional expertise to fine tune. Nonetheless, it is certainly something to be scoped (thus the greenish/greyish colour)
OpenBLAS is certainly the next thing to be added to the build. Following that, integrating the building FFTW should also be interesting.
Above that in the stack, SciPy and Python3 which are also interesting domains to investigate


Concerning GCC and LLVM, the yellow colour denotes that it might interesting to fetch those from outside of distributions/OpenHPC but not so much to build them. GCC especially since OpenHPC's 8.3.0 (as well as RedHat's 4.8.5) contains a bug that breaks the NumPy build.
LLVM is put there, but not used in the recipes, as it is the target of work by the HPC-SIG and seems to be able to build TensorFlow. It requires further investigation.
Both toolchains could be acquired through Linaro's Toolchain Group, as they do CI both. The place to do so remains to be investigated.