ML Fundamentals
What is ResNet (Residual Network) and what did residual connections solve?
· 4 min read · By Jon Jovinsson
ResNet, introduced by Microsoft Research in 2015, solved a fundamental problem: very deep neural networks trained worse than shallower ones because gradients vanished before reaching early layers. ResNet's solution was the residual connection, also called a skip connection: each block adds its input directly to its output (x + F(x)) before passing to the next layer. This lets gradients flow backward through the shortcut path, enabling networks 100+ layers deep to train reliably.
Why skip connections work
A residual block learns the difference (residual) between the input and the desired output, not the full mapping. If the optimal transformation is close to an identity function (do nothing), the block simply learns weights near zero and passes the input through unchanged. This makes the optimisation problem easier and means adding more layers can only help, not hurt, if the extra layers learn nothing useful.
ResNet variants and when to use them
- →ResNet-18/34: small, fast, good for embedded or edge inference
- →ResNet-50: the standard workhorse for most image classification tasks
- →ResNet-101/152: higher accuracy but heavier, use when compute is not a constraint
- →ResNeXt: a grouped convolution variant with better accuracy/compute trade-off
ResNet in production today
ResNet-50 is still one of the most deployed image models in production globally, including in Australian industries like mining (equipment classification, site monitoring) and retail (product image classification, shelf analysis). Fine-tuning a pretrained ResNet-50 on domain-specific images typically takes hours on a single GPU and produces strong results with relatively small labelled datasets.
Residual connections beyond vision
The residual connection principle spread well beyond computer vision. Transformer architectures include skip connections around both the attention and feed-forward layers, which is a direct inheritance from ResNet. The insight that depth is trainable if you give gradients a shortcut path turned out to be one of the most important ideas in modern deep learning.