PyTorch training on M1-Air GPU

Abhishek Bose
3 min readMay 22, 2022

--

PyTorch recently announced that their new release would utilise the GPU on M1 arm chipset macs. This was indeed a delight for deep learning enthusiasts who own an M1 mac. In order to check out the performance of this new release, I went ahead and ran some tests. The device I ran this test was on an M1 Macbook air.

Fig 1: Pytorch news on GPU-accelerated Pytorch training on M1 ultra. Source: https://media-exp1.licdn.com/dms/image/C5622AQE2cVf4CSnsEQ/feedshare-shrink_800/0/1652886803211?e=1655942400&v=beta&t=LXDxwxypvrlRYUCb-2Ykjyjh_ioXFjceH1c32-JiIEQ

I took some inspiration from Sebastian Raschka’s blog on benchmarking this release on different chipsets → here.

On the official Pytorch page, the nightly build is not directly present as shown below in Fig 2:

Fig 2: Pytorch installation options

In order to download the nightly build with GPU support the following steps have to be followed:

  1. Create a conda repository for apple silicon builds
CONDA_SUBDIR=osx-arm64 conda create -n py39_native python=3.9 -c conda-forge — override-channels

2. Once the environment is created, set the env vars to permanently always download native apple silicon builds

conda config --env --set subdir osx-arm64

3. Install the pytorch nightly build having GPU support

conda activate py39_nativepip install — pre torch torchvision torchaudio — extra-index-url https://download.pytorch.org/whl/nightly/cpu

4. Run benchmarks on MNIST on GPU using the notebook given below :

The results were a bit shocking for me actually. Setting the device to “cpu” actually performed better than setting the device to mps which is the cuda equivalent for M1 devices.

I took a sample of 1000 data points from the MNIST training dataset and timed a single epoch over all the batches for this sample set.

Models used were vgg16, alexnet, resnet18, mobilenetv2 and efficientnetb0.

CPU results: Fig 3

Fig: Time in seconds for training 1 epoch on CPU

GPU results: Fig 4

Fig 4 : Time in seconds for training 1 epoch on GPU

I am actually a bit surprised here. I think a lot of work is left to be done on optimising the data handoff from the ram to gpu for the M1s. Sebastian Raschka’s blog also shows us that CUDA based GPUs are way better for training at the moment.

I would also like the community to come forward and run more tests across different models, in order to understand how this feature can be improved in the future.

--

--

Abhishek Bose

Machine Learning Engineer III at Swiggy. On a quest for technology.