Julian's musings

Installing TensorFlow and TensorFlow Addons on Debian bullseye (Debian 11)

deep-learningPermalink

Though I usually use PyTorch for my deep learning work, I have just been given a piece of code written using TensorFlow, so I needed to install TensorFlow and TensorFlow Addons on my Debian testing system (aka Debian bullseye, which will shortly become Debian 11). Unfortunately, the binaries available on PyPI are only built for Python 3.6-3.8, but Debian bullseye now runs Python 3.9.

This blog post documents how I managed to build these packages for my system (which was far more effort than it probably should have been!).

System setup

I have the following key packages/libraries installed. There will certainly be others that I have overlooked, of course; please feel free to let me know of anything I’ve overlooked.

  • Python: python3-dev (currently version 3.9.2-2)
  • Essential Python packages (according to the TensorFlow installation page) are pip, numpy, wheel and keras_preprocessing. The Debian packages providing these are:
    • python3-keras-preprocessing
    • python3-numpy
    • python3-pip
    • python3-wheel

    though the version of keras_preprocessing currently in the Debian archive is older than that required by the TensorFlow installation, so it may be wiser to install it with

    pip install -U --user keras_preprocessing --no-deps
    

    as explained on the TensorFlow installation page.

  • Python packages: either install Debian versions of these or let pip install them during the package installation. The relevant packages seem to be:
    • python3-flatbuffers
    • python3-google-auto-oauthlib
    • python3-grpcio (*)
    • python3-h5py
    • python3-markdown
    • python3-protobuf
    • python3-requests
    • python3-setuptools
    • python3-six
    • python3-termcolor
    • python3-typeguard (*)
    • python3-typing-extensions
    • python3-werkzeug
    • python3-wrapt and their dependencies.

    (*): In these cases, the current Debian version is too old for TensorFlow 2.4.1.

  • cuDNN: libcudnn8-dev (version 8.1.0.77-1+cuda11.2; this package was downloaded directly from the NVIDIA website)
  • Go compiler: golang-go - this is needed for bazelisk.

Unfortunately, it turns out that the Debian-packaged CUDA packages do not work when building TensorFlow; see this GitHub issue. It is likely this this will not be fixed. On the other hand, the NVIDIA-provided Debian packages don’t seem to work nicely with some of the other parts of the system, so I never used them.

To get around this, I downloaded cuda_11.2.1_460.32.03_linux.run from the NVIDIA website. (The current version, though, is 11.2.2, but as my Debian CUDA packages are 11.2.1, I downloaded the older version from the Archive of Previous CUDA Releases.)

I needed write permission on the parent directory of the desired target location, so I created a directory /usr/local/cuda-11.2.1 (as root) and then ran

$ chown jdg:jdg /usr/local/cuda-11.2.1

(I could equally have created this directory in some other location without needing to be root.) I then unpacked the CUDA package into it:

$ sh cuda_11.2.1_460.32.03_linux.run --installpath=/usr/local/cuda-11.2.1/cuda

After the installation, I tidied up:

$ mv /usr/local/cuda-11.2.1/cuda/* /usr/local/cuda-11.2.1/
$ rmdir /usr/local/cuda-11.2.1/cuda/

so that everything is now directly in /usr/local/cuda-11.2.1. (At the end of the build process, it appears that this entire directory can be deleted, as long as the relevant Debian CUDA packages are still present.)

Building TensorFlow

I started by following the guidance on the TensorFlow installing from source webpage.

Installing Bazel

There will eventually be a Debian bazel package, but unfortunately that is some way in the future still (there are a team working on this, but there are currently technical difficulties). So I installed Bazelisk, following the instructions on the Baselisk GitHub page:

$ go get github.com/bazelbuild/bazelisk
$ export PATH=$PATH:$(go env GOPATH)/bin
$ (cd $(go env GOPATH)/bin && ln -s bazelisk bazel)

(The export may well be extraneous, as PATH is usually already exported.)

Build directory setup

Since I am building both TensorFlow and TensorFlow Addons, I created a directory called ~/packages/tensorflow and clone the git repositories into that directory; the intention is to build the wheels in that same directory, so everything is together.

Downloading the TensorFlow source code

I followed the instructions as given; I also checked out the latest release branch, which at the time of writing is r2.4.

$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout -t origin/r2.4

Configuring the build

Here begins the fun! Some of these lines have been wrapped to fit better on the screen.

$ ./configure
You have bazel 3.1.0 installed.
Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/lib/python3.9/dist-packages
  /usr/local/lib/python3.9/dist-packages
  /home/jdg/lib/python
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]
/home/jdg/.local/lib/python3.9/site-packages

I can’t write to the system directory, and it seems as though this is where some libraries may be written, so I set it to be my local repository instead. I don’t know whether leaving this as /usr/lib/... would work equally well.

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: 
No TensorRT support will be enabled for TensorFlow.

I accepted the defaults for both of these; I don’t have TensorRT installed.

Inconsistent CUDA toolkit path: /usr vs /usr/libAsking for detailed CUDA configuration...

Please specify the CUDA SDK version you want to use.
[Leave empty to default to CUDA 10]: 11.2


Please specify the cuDNN version you want to use.
[Leave empty to default to cuDNN 7]: 8.1

I have CUDA 11.2.1 installed, so I responded 11.2. Perhaps this would be better as just 11, but I’m not sure. Likewise, I have cuDNN 8.1.0.77 installed, so I responded 8.1.

Please specify the locally installed NCCL version you want to use.
[Leave empty to use http://github.com/nvidia/nccl]: 

I left this empty, as I do not have NCCL installed.

Please specify the comma-separated list of base paths to look for CUDA
libraries and headers. [Leave empty to use the default]:
/usr/local/cuda-11.2.1

This is the point at which everything goes wrong with the Debian-packaged version of the CUDA libraries. The Debian versions can be installed on the system simultaneously during the build, it seems, but they do not work for building TensorFlow. So I gave the path to the NVIDIA-unpackage libraries.

Please specify a list of comma-separated CUDA compute capabilities you
want to build with.

You can find the compute capability of your device at:
https://developer.nvidia.com/cuda-gpus.
Each capability can be specified as "x.y" or "compute_xy" to include
both virtual and binary GPU code, or as "sm_xy" to only include the
binary code.

Please note that each additional compute capability significantly
increases your build time and binary size, and that TensorFlow only
supports compute capabilities >= 3.5 [Default is: 3.5,7.0]:

I set this to 3.5,7.5 based on the information on the webpage referred to. Maybe I don’t need the 3.5 part, but I’m not sure, so I left it in.

Do you want to use clang as CUDA compiler? [y/N]: 

Please specify which gcc should be used by nvcc as the host
compiler. [Default is /usr/bin/gcc]: 

Please specify optimization flags to use during compilation when bazel
option "--config=opt" is specified [Default is -Wno-sign-compare]: 

Would you like to interactively configure ./WORKSPACE for Android
builds? [y/N]:

I left all of these with their default settings.

And the configuration is finished!

Building the pip package

Unfortunately, some of the scripts in the TensorFlow sources call python using the shebang formulation #!/usr/bin/env python, which breaks unless there is a python on PATH; this should presumably be Python 3.x (though I haven’t checked). In Debian 10, python was python2, but in Debian 11, there is no python executable by default (though there is a python-is-python3 package which could be installed, which creates /usr/bin/python as a symlink to /usr/bin/python3). Since I don’t want things to break silently, I didn’t install that package, but instead set up a local symlink for the purpose of this build. (Note that the bazel environment variable PYTHON_BIN_PATH does not help at all.)

$ mkdir ../bin
$ ln -s /usr/bin/python3 ../bin/python
$ PATH=$(realpath ../bin):$PATH

I then ran the bazel command:

$ bazel build --config=cuda //tensorflow/tools/pip_package:build_pip_package

This compilation step is pretty long.

Building and installing the pip package

Following the instructions, I ran:

./bazel-bin/tensorflow/tools/pip_package/build_pip_package ..

so that the resulting wheel ended up in the parent directory.

I then installed the wheel; in my case, the command line was:

pip install --user ../tensorflow-2.4.1-cp39-cp39-linux_x86_64.whl

Installing TensorFlow Addons

The instructions for this are on the TensorFlow website here.

I first cloned the repository; again, I started in the directory ~/packages/tensorflow.

$ git clone https://github.com/tensorflow/addons.git
$ cd addons
$ git checkout -t origin/r0.12

The exports, though, are not quite as described. They should instead be the following:

$ export TF_NEED_CUDA=1
$ export CUDA_TOOLKIT_PATH=/usr/local/cuda-11.2.1

(and I have just reported it, so this may well be fixed very soon). The rest ran smoothly:

$ python3 ./configure.py
$ bazel build build_pip_pkg
$ bazel-bin/build_pip_pkg ..
$ pip install ../tensorflow_addons-0.12.2-cp39-cp39-linux_x86_64.whl

And after this, I removed (actually just temporarily renamed) /usr/local/cuda-11.2.1, and everything still works.

comments powered by Disqus