September 2012T1
AlexNet — The Deep-Learning Era Begins
At ImageNet 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton of the University of Toronto reached a top-5 error rate of 15.3%—more than ten points ahead of the runner-up's 26.2% obtained by conventional methods. Their convolutional neural network, 'AlexNet', trained on two NVIDIA GTX 580 GPUs, proved the practical viability of deep learning overnight. Computer vision shifted, almost completely, from hand-engineered features to deep learning from that point forward.
Metadata
- Date
- September 2012
- Decade
- 2010s
- Tier
- T1
- Timelines
- A General History of Information Technology · A History of Artificial Intelligence · A History of Semiconductors and Hardware
- Sources
- 02
- Connections
- 01
AlexNet — The Deep-Learning Era Begins
On 30 September 2012, at a workshop attached to ECCV in Florence, the results of the ImageNet Large Scale Visual Recognition Challenge 2012 were announced.
The leader was the University of Toronto — top-5 error rate of 15.3%. Second place — a team from the University of Tokyo — 26.2%.
A ten-point gap on an image-recognition benchmark was, historically, impossible. The 2011 winner had finished at 25.8%; a team of three graduate students had cut the error nearly in half in a single year.
And they had done it with neural networks—the technique that most computer-vision researchers had, by then, written off as no longer practical.
A Team of Three
The paper had three authors.
- Alex Krizhevsky — then a PhD student. He wrote the architecture, the GPU optimisations, the CUDA code—everything.
- Ilya Sutskever — also a PhD student. Later co-founder and Chief Scientist of OpenAI.
- Geoffrey Hinton — their advisor. Thirty years into neural-network research, and one of the very few who had stayed on that side through the second AI winter.
The paper was not titled 'AlexNet'. It was "ImageNet Classification with Deep Convolutional Neural Networks." The community started calling it AlexNet, in citation chains, because the first author's name was easier to say.
What Shifted
Before AlexNet, image recognition was the art of feature engineering. SIFT, HOG, SURF — algorithms that extracted 'corners', 'gradients', 'textures', designed by hand and fed into a classifier such as an SVM. The longest part of writing a paper was usually the tinkering with features.
What AlexNet demonstrated was the alternative: let the network learn the features itself. Five convolutional layers, three fully-connected layers, roughly 60 million parameters in all, trained end-to-end on 1.3 million images from ImageNet. Not a single hand-designed feature.
Two GPUs
The other crucial fact in the paper was that this had now become possible on practical hardware.
A network of AlexNet's size could not be trained in any reasonable time on a CPU. Krizhevsky used two NVIDIA GTX 580 GPUs—high-end consumer gaming cards—and wrote his own CUDA kernels to split the network across them and train in parallel.
Six days of training, end to end.
As the first large-scale demonstration of GPU-driven deep learning, AlexNet would also reshape NVIDIA. From 2012 onward, NVIDIA gradually shifted from being a maker of gaming GPUs to being the central infrastructure company of AI computation. By 2024, NVIDIA was the second most valuable company in the world by market capitalisation.
The Third Summer Opens
AI's history has had two 'winters', framing two 'summers' — the first from 1956 to the mid-1970s, the second the expert-systems boom of the 1980s.
What AlexNet began was the third summer — the deep-learning era. Unlike the previous summers, more than a decade in, by 2026, it is still going.
Sources
SecondaryAlexNet — Wikipedia