AlexNet — The Deep-Learning Era Begins

Name: AlexNet — The Deep-Learning Era Begins
Start: 2012-09-30

On 30 September 2012, at a workshop attached to ECCV in Florence, the results of the ImageNet Large Scale Visual Recognition Challenge 2012 were announced.

The leader was the University of Toronto — top-5 error rate of 15.3%. Second place — a team from the University of Tokyo — 26.2%.

A ten-point gap on an image-recognition benchmark was, historically, impossible. The 2011 winner had finished at 25.8%; a team of three graduate students had cut the error nearly in half in a single year.

And they had done it with neural networks—the technique that most computer-vision researchers had, by then, written off as no longer practical.

A Team of Three

The paper had three authors.

Alex Krizhevsky — then a PhD student. He wrote the architecture, the GPU optimisations, the CUDA code—everything.
Ilya Sutskever — also a PhD student. Later co-founder and Chief Scientist of OpenAI.
Geoffrey Hinton — their advisor. Thirty years into neural-network research, and one of the very few who had stayed on that side through the second AI winter.

The paper was not titled 'AlexNet'. It was "ImageNet Classification with Deep Convolutional Neural Networks." The community started calling it AlexNet, in citation chains, because the first author's name was easier to say.

What Shifted

Before AlexNet, image recognition was the art of feature engineering. SIFT, HOG, SURF — algorithms that extracted 'corners', 'gradients', 'textures', designed by hand and fed into a classifier such as an SVM. The longest part of writing a paper was usually the tinkering with features.

What AlexNet demonstrated was the alternative: let the network learn the features itself. Five convolutional layers, three fully-connected layers, roughly 60 million parameters in all, trained end-to-end on 1.3 million images from ImageNet. Not a single hand-designed feature.

Two GPUs

The other crucial fact in the paper was that this had now become possible on practical hardware.

A network of AlexNet's size could not be trained in any reasonable time on a CPU. Krizhevsky used two NVIDIA GTX 580 GPUs—high-end consumer gaming cards—and wrote his own CUDA kernels to split the network across them and train in parallel.

Six days of training, end to end.

As the first large-scale demonstration of GPU-driven deep learning, AlexNet would also reshape NVIDIA. From 2012 onward, NVIDIA gradually shifted from being a maker of gaming GPUs to being the central infrastructure company of AI computation. By 2024, NVIDIA was the second most valuable company in the world by market capitalisation.

The Third Summer Opens

AI's history has had two 'winters', framing two 'summers' — the first from 1956 to the mid-1970s, the second the expert-systems boom of the 1980s.

What AlexNet began was the third summer — the deep-learning era. Unlike the previous summers, more than a decade in, by 2026, it is still going.

AlexNet — The Deep-Learning Era Begins

Metadata

AlexNet — The Deep-Learning Era Begins

A Team of Three

What Shifted

Two GPUs

The Third Summer Opens

Sources