Compiling TensorFlow 2

A descent into madness

I decided to embark on a journey this week and I've gone down so many rabbit holes it's left me a bit bleary eyed at the end of the day a few times. I've been really excited playing with two pieces of software lately: Frigate and PhotoPrism. The latest Frigate beta has come out with a lot of new features and one I'm very excited about is the GPU detectors. This started my plunge into madness with TensorFlow, TensorRT, CUDA, etc.. etc.. I had no clue what kind of strict dependency management hell I was getting myself into until I tried to build the latest (as of writing this) TensorFlow, 2.11.

I try to keep most things short and sweet, so if there's anything wrong here send me a message on Mastodon or leave a comment here and I'll see to updating it.

Building TensorFlow 2 (tensorflow-2.11.0)

To compile TensorFlow 2 you will need GCC 9 still. This is the default of Ubuntu 20.04, 22.04 has 11.2.0.

Pre-requisites

In parenthesis is the version I used when building TensorFlow 2.

  • Ubuntu 20.04 (20.0.4.5) & GCC 9 (9.4.0)
  • Python 3.7-3.10 (3.8)
  • CUDA 11.2 (11.2.2) && Nvidia Driver 460.32.03 (one package, blacklist noveau)
  • CUDNN 8.1 (8.1.1.33)
  • Bazel 5.3.0

Python

sudo apt install python3-dev python3-pip python3-venv

CUDA 11.2 & Nvidia driver 460.32.03

Docs: https://docs.nvidia.com/cuda/archive/11.2.2/

blacklist noveau
https://linuxconfig.org/how-to-disable-blacklist-nouveau-nvidia-driver-on-ubuntu-20-04-focal-fossa-linux

wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
sudo sh cuda_11.2.2_460.32.03_linux.run

export PATH="$PATH:/usr/local/cuda-11.2/bin"
export LD_LIBRARY_PATH="/usr/local/cuda-11.2/lib64"

CUDNN 8.1.1.33

tar -xzvf cudnn-11.2-linux-x64-v8.1.1.33.tgz
# Tip: adding -av flag while copying the cudnn libs solves the problem of symlinks. The exact commands would be

sudo cp -av cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp -av cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

Bazel

sudo apt-get install build-essential openjdk-11-jdk python zip unzip
wget https://github.com/bazelbuild/bazel/releases/download/5.3.2/bazel-5.3.2-dist.zip
mkdir bazel-5.3.2
unzip -d ./bazel-5.3.2 bazel-5.3.2-dist.zip
cd bazel-5.3.2

env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh
export PATH=$PATH:`pwd`output/

Tensorflow

git clone https://github.com/tensorflow/tensorflow
cd tensorflow && git checkout v2.11.0

python3 -m venv ~/.virtualenvs/tf_dev
source ~/.virtualenvs/tf_dev/bin/activate
pip install -U pip six numpy wheel setuptools mock future packaging requests
pip install -U keras_applications --no-deps
pip install -U keras_preprocessing --no-deps
# @ 2/6/2023
# six==1.16.0, numpy==1.24.2, mock==5.0.1, future==0.18.3, packaging==23.0,
# requests==2.28.2, requests-oauthlib==1.3.1

./configure
# No to ROCm support
# CUDA yes
# TensorRT No 
# Select the GPU compute limits you want to build for from here: https://developer.nvidia.com/cuda-gpus#compute

# For python package
bazel build --config=opt -c opt //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

# for binary 
bazel build --jobs 2 --config=opt //tensorflow:libtensorflow.so

Install Python Package

# run `deactivate` if still in python venv
pip install /tmp/tensorflow_pkg/tensorflow-2.3.0-cp38-cp38-linux_x86_64.whl

TensorRT (Optional)

sudo cp -av /usr/local/TensorRT/include/*.h /usr/local/cuda/include
sudo cp -av /usr/local/TensorRT/lib/ /usr/local/cuda/lib

Test it out!

python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"