Building pytorch for systems without avx2 instructions

Nov. 26, 2023, 11:20 p.m.   wschaub  

I recently got very interested in open source AI and wanted to build out a server with a modern GPU to do more experiments on. The problem was that I had the money for a modern GPU but not enough to build out a new system to put it in.

I got my hands on a decommissioned workstation from about 2012 for the low price of free and stuck the GPU in that as a temporary solution. I named it GLaDOS since I'm stuffing a very new and very expensive RTX 3090 into a system that's so old it might as well be a potato. It's from the sandy bridge era and it does not support avx2 instructions. (it's otherwise very powerful for its age being a dual 8 core xeon with 64GB (soon to be 128GB) of memory)

pytorch has some components that assume you have avx2 and this blog post describes How i re-compiled pytorch to work with the potato that I put my new GPU into.

The problem

I had been going through the hugging face tutorials for the transformers library just fine until i decided to try out a project called glados-tts which uses a text to speech model trained to talk like GLaDOS from Portal2. it will initilize the NNPACK library and give me a warning about unsupported hardware. immediately after that I would get an illegal instruction exception. It turns out this was from one of the compiled libraries in pytorch trying to use avx2 instructions.

The Solution

I'm no stranger to working with C compilers and build systems of all descriptions so let's kick it old school like it's the early 2000s and we're tweaking compiler flags in gentoo.

I initially tried to change the NNPACK code to use a different backend, which I do still do in this article but the real culprit actually turned out to be fbgemm (facebook general matrix multiplication) Which requires avx2 to work at all. I found this out by running python under gdb and doing a backtrace. It also turns out that you can opt not to use fbgemm when building pytorch and this along with setting NNPACK_BACKEND=psimd was the solution.

Install pre-requisites

You'e going to need the following things on your system before attempting to build pytorch. I'm building on Ubuntu 22.04 This is not an exhaustive list but I will mention that all my nvidia software comes from the nvidia developer package repo

  • latest nvidia driver
  • latest CUDA to match that driver I used 12.3
  • CUDNN libraries
  • libmkl-dev-full
  • cmake
  • ccmake
  • git-lfs
  • git
  • build-essential

Set up a virtual environment to build pytorch

I set up a virtual environment with the system python for my build but it's probably even easier to use conda. I'm still learning how to use conda so I will describe my method. I think the general build steps will work just as well in a conda environment if not better. I'm used to working with python -mvenv envnamehere when working with Django projects so it's kind of been my default choice.

python3 -mvenv blog_test
source blog_test/bin/activate
pip install -U pip setuptools wheel
pip install cmake ninja

Get pytorch sources

mkdir src
cd src
git lfs install
git clone --recursive --branch v2.1.1
cd pytorch
git submodule sync
git submodule update --init --recursive

Set up the build

pip install -r requirements.txt
export USE_FBGEMM=0
export MAX_JOBS=32
python --cmake-only install

At this point we want to use ccmake to change some options.

ccmake build

set DISABLE_AVX2=ON set NNPACK_BACKEND=psimd make sure USE_FBGEMM is set to OFF

Select configure from the menu, select exit screen and then quit ccmake.

Build/install into your virtual environment

This step will take a long time. On my dual xeon E5-2670 with 64GB of ram it takes under an hour. I'm using the MAX_JOBS environment variable to limit the build to 32 compilers going at the same time. This can take up quite a bit of memory and without incresing my swap file to 32GB in size it will hang the machine at the later stages of the build.

You will need to tune MAX_JOBS and the size of your swap for your system.

python install

When this is finished pytorch should now be installed into your current virtual environment.

Test the build

I was trying to use the glados-tts project when I ran into this particular problem so let's test that out. if it runs on your machine that lacks AVX2 instructions you've built pytorch successfully.

Using the same virtual environment you've installed pytorch into clone glados-tts and try to run

cd $HOME/src
git clone
cd glados-tts
pip install phonemizer inflect unidecode scipy

Now run it should look something like this. Type in a sentence and it should generate output.wav and try to play it with aplay. you can scp the file to test that it produced audio correctly if you don't have local sound. it should look like this:

(blog_test) wschaub@GLaDOS:~/src/glados-tts$ python 
Initializing TTS Engine...
Input: Would you like some cake?
Forward Tacotron took 151.5219211578369ms
HiFiGAN took 212.4173641204834ms
Playing WAVE './output.wav' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono

Package into binary wheel distribution so you can easily pull it into your projects.

Now that we know it works lets package it up so we can install it into all our other virtual environment easily. We should still be inside the blog_test virtual environment at this point and still have the same environment variables set as before when we did the build.

cd $HOME/src/pytorch
python bdist_wheel

Inside $HOME/src/pytorch/dist you should find a file that looks something like torch-2.1.1-cp310-cp310-linuxx8664.whl

copy it to your home directory and try installing it into a fresh python virtualenv and re-test with glados-tts:

cp  dist/torch-2.1.1-cp310-cp310-linux_x86_64.whl $HOME/
cd $HOME
python3 -m venv new-venv
source new-venv/bin/activate
pip install -U pip setuptools wheel
pip install pip install torch torchvision torchaudio $HOME/torch-2.1.1-cp310-cp310-linux_x86_64.whl
pip install phonemizer inflect unidecode scipy
cd $HOME/src/glados-tts/

If all goes well you should be set for a while just install torch from your .whl file.

avx2 pytorch AI python