Though I usually use PyTorch for my deep learning work, I have just been given a piece of code written using TensorFlow, so I needed to install TensorFlow and TensorFlow Addons on my Debian testing system (aka Debian bullseye, which will shortly become Debian 11). Unfortunately, the binaries available on PyPI are only built for Python 3.6-3.8, but Debian bullseye now runs Python 3.9.
This blog post documents how I managed to build these packages for my system (which was far more effort than it probably should have been!).
I have the following key packages/libraries installed. There will certainly be others that I have overlooked, of course; please feel free to let me know of anything I’ve overlooked.
python3-dev(currently version 3.9.2-2)
- Essential Python packages (according to the TensorFlow installation
keras_preprocessing. The Debian packages providing these are:
though the version of
keras_preprocessingcurrently in the Debian archive is older than that required by the TensorFlow installation, so it may be wiser to install it with
pip install -U --user keras_preprocessing --no-deps
as explained on the TensorFlow installation page.
- Python packages: either install Debian versions of these or let
pipinstall them during the package installation. The relevant packages seem to be:
python3-wraptand their dependencies.
(*): In these cases, the current Debian version is too old for TensorFlow 2.4.1.
libcudnn8-dev(version 18.104.22.168-1+cuda11.2; this package was downloaded directly from the NVIDIA website)
- Go compiler:
golang-go- this is needed for
Unfortunately, it turns out that the Debian-packaged CUDA packages do not work when building TensorFlow; see this GitHub issue. It is likely this this will not be fixed. On the other hand, the NVIDIA-provided Debian packages don’t seem to work nicely with some of the other parts of the system, so I never used them.
To get around this, I downloaded
from the NVIDIA
(The current version, though, is 11.2.2, but as my Debian CUDA
packages are 11.2.1, I downloaded the older version from the Archive
of Previous CUDA
I needed write permission on the parent directory of the
desired target location, so I created a directory
/usr/local/cuda-11.2.1 (as root) and then ran
$ chown jdg:jdg /usr/local/cuda-11.2.1
(I could equally have created this directory in some other location without needing to be root.) I then unpacked the CUDA package into it:
$ sh cuda_11.2.1_460.32.03_linux.run --installpath=/usr/local/cuda-11.2.1/cuda
After the installation, I tidied up:
$ mv /usr/local/cuda-11.2.1/cuda/* /usr/local/cuda-11.2.1/ $ rmdir /usr/local/cuda-11.2.1/cuda/
so that everything is now directly in
the end of the build process, it appears that this entire directory
can be deleted, as long as the relevant Debian CUDA packages are still
I started by following the guidance on the TensorFlow installing from source webpage.
There will eventually be a Debian bazel package, but unfortunately that is some way in the future still (there are a team working on this, but there are currently technical difficulties). So I installed Bazelisk, following the instructions on the Baselisk GitHub page:
$ go get github.com/bazelbuild/bazelisk $ export PATH=$PATH:$(go env GOPATH)/bin $ (cd $(go env GOPATH)/bin && ln -s bazelisk bazel)
export may well be extraneous, as
PATH is usually already
Build directory setup
Since I am building both TensorFlow and TensorFlow Addons, I created a
~/packages/tensorflow and clone the git
repositories into that directory; the intention is to build the wheels
in that same directory, so everything is together.
Downloading the TensorFlow source code
I followed the instructions as given; I also checked out the latest
release branch, which at the time of writing is
$ git clone https://github.com/tensorflow/tensorflow.git $ cd tensorflow $ git checkout -t origin/r2.4
Configuring the build
Here begins the fun! Some of these lines have been wrapped to fit better on the screen.
$ ./configure You have bazel 3.1.0 installed. Found possible Python library paths: /usr/lib/python3/dist-packages /usr/lib/python3.9/dist-packages /usr/local/lib/python3.9/dist-packages /home/jdg/lib/python Please input the desired Python library path to use. Default is [/usr/lib/python3/dist-packages] /home/jdg/.local/lib/python3.9/site-packages
I can’t write to the system directory, and it seems as though this is
where some libraries may be written, so I set it to be my local
repository instead. I don’t know whether leaving this as
/usr/lib/... would work equally well.
Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with TensorRT support? [y/N]: No TensorRT support will be enabled for TensorFlow.
I accepted the defaults for both of these; I don’t have TensorRT installed.
Inconsistent CUDA toolkit path: /usr vs /usr/libAsking for detailed CUDA configuration... Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]: 11.2 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 8.1
I have CUDA 11.2.1 installed, so I responded
11.2. Perhaps this would
be better as just
11, but I’m not sure. Likewise, I have cuDNN
22.214.171.124 installed, so I responded
Please specify the locally installed NCCL version you want to use. [Leave empty to use http://github.com/nvidia/nccl]:
I left this empty, as I do not have NCCL installed.
Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]: /usr/local/cuda-11.2.1
This is the point at which everything goes wrong with the Debian-packaged version of the CUDA libraries. The Debian versions can be installed on the system simultaneously during the build, it seems, but they do not work for building TensorFlow. So I gave the path to the NVIDIA-unpackage libraries.
Please specify a list of comma-separated CUDA compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code. Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]:
I set this to
3.5,7.5 based on the information on the webpage
referred to. Maybe I don’t need the
3.5 part, but I’m not sure, so
I left it in.
Do you want to use clang as CUDA compiler? [y/N]: Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
I left all of these with their default settings.
And the configuration is finished!
Building the pip package
Unfortunately, some of the scripts in the TensorFlow sources call
python using the shebang formulation
#!/usr/bin/env python, which
breaks unless there is a
PATH; this should presumably be
Python 3.x (though I haven’t checked). In Debian 10,
python2, but in Debian 11, there is no
python executable by
default (though there is a
python-is-python3 package which could be
installed, which creates
/usr/bin/python as a symlink to
/usr/bin/python3). Since I don’t want things to break silently, I
didn’t install that package, but instead set up a local symlink for the
purpose of this build. (Note that the bazel environment variable
PYTHON_BIN_PATH does not help at all.)
$ mkdir ../bin $ ln -s /usr/bin/python3 ../bin/python $ PATH=$(realpath ../bin):$PATH
I then ran the bazel command:
$ bazel build --config=cuda //tensorflow/tools/pip_package:build_pip_package
This compilation step is pretty long.
Building and installing the pip package
Following the instructions, I ran:
so that the resulting wheel ended up in the parent directory.
I then installed the wheel; in my case, the command line was:
pip install --user ../tensorflow-2.4.1-cp39-cp39-linux_x86_64.whl
Installing TensorFlow Addons
The instructions for this are on the TensorFlow website here.
I first cloned the repository; again, I started in the directory
$ git clone https://github.com/tensorflow/addons.git $ cd addons $ git checkout -t origin/r0.12
The exports, though, are not quite as described. They should instead be the following:
$ export TF_NEED_CUDA=1 $ export CUDA_TOOLKIT_PATH=/usr/local/cuda-11.2.1
(and I have just reported it, so this may well be fixed very soon). The rest ran smoothly:
$ python3 ./configure.py $ bazel build build_pip_pkg $ bazel-bin/build_pip_pkg .. $ pip install ../tensorflow_addons-0.12.2-cp39-cp39-linux_x86_64.whl
And after this, I removed (actually just temporarily renamed)
/usr/local/cuda-11.2.1, and everything still works.