Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Quick note : To look at how our Jenkins execute executes the full job, see : https://github.com/Linaro/hpc_lab_setup/blob/tensorflowci/files/build_tensorflow.yml

...

Once NumPy is built, we can get to the nitty-gritty : building TensorFlow itself. Here, the environment that is setup up to this point, by both the Ansible and the bash, can build 1.15.0 and 2.0.0 TensorFlow just fine (but tensorflow-benchmarks hasn't been made to work with TensorFlow 2... And the split up of tf.collab is non trivial...).
Configuration of the TensorFlow build has to be done with sourcing environment variables... Which is achieved through a script to maintain some sanity.
The TensorFlow does also require a patch, which is due to a missing requirement in Bazel's build configuration (i.e. WORKSPACE) : it is a quite well known issue.
Then the build itself is where we feed the compiler arguments (see /roles/tensorflow/defaults/main.yml : tensorflow.c_optimizations and tensorflow.c_optimizations)
Once it is built (and the pip package is built as well), the final trick is to install h5py (HDF5 python lib/API) via the command line (shell module in Ansible), with the variable HDF5_DIR pointing to the base of the HDF5 installation. (this might be a problem with LMOD, needs further investigation)
TensorFlow is built and ready to install !

After this and as a final step, the playbook will execute a "Hello World" script to make sure the TensorFlow installation is functional, this script is ripped out of the official documentation 101 : https://www.tensorflow.org/tutorials/quickstart/beginner

Further Notes


The HPC SIG's scripts to build Tensorflow support only CentOS 7 at the moment of writing this article.
Efforts will be made to ensure the compatibility with CentOS 8, and then we will look into Debian environments (probably after adding the OpenBLAS and FFTW build)

The scripts/ansible recipe does make use of LMod to keep track of the libraries, and OpenHPC's binaries for OpenBLAS, FFTW, GCC 8.3.0 and LMod.
Tensorflow and NumPy 1.17.3 do require Python 3, and the EOL of Python 2 is fast approaching, but sadly, the dnf/yum module in Ansible seems very dodgy at the moment, so we still need python2, at least for the ansible.

The Ansible makes use of venv (a.k.a virtualenv) to keep (at least) the python dependencies contained and easily identifiable.

GCC is used to build all necessary pieces (at the moment, 8.3.0 from OpenHPC)

Further Enhancements

Note: The TensorFlow Ansible and Jenkins Job is on a PR at the moment, to be merged with master hpc_lab_jenkins and put into production, please add any issue encountered or comments here : https://github.com/Linaro/hpc_lab_setup/pull/91

The first domain of focus for enhancement would be the NumPy build ; allowing optimizations, looking at fetching the UMFPACK/AMD libraries to hook it up to.
Also a round of clean up is due, to make sure that you can turn on and off certain components build, and fetch the components not built via pip/yum install pre-built packages (from other builds : especially numpy and bazel).
Adding benchmark runs/testsuite runs of the stack's components is also a required step.
Then we will focus on adding more parts of the stack to the build :

...