ML Fundamentals

What is ResNet (Residual Network) and what did residual connections solve?

31 March 2026 · 4 min read · By Jon Jovinsson

ResNet, introduced by Microsoft Research in 2015, solved a fundamental problem: very deep neural networks trained worse than shallower ones because gradients vanished before reaching early layers. ResNet's solution was the residual connection, also called a skip connection: each block adds its input directly to its output (x + F(x)) before passing to the next layer. This lets gradients flow backward through the shortcut path, enabling networks 100+ layers deep to train reliably.

Why skip connections work

A residual block learns the difference (residual) between the input and the desired output, not the full mapping. If the optimal transformation is close to an identity function (do nothing), the block simply learns weights near zero and passes the input through unchanged. This makes the optimisation problem easier and means adding more layers can only help, not hurt, if the extra layers learn nothing useful.

ResNet variants and when to use them

→ResNet-18/34: small, fast, good for embedded or edge inference
→ResNet-50: the standard workhorse for most image classification tasks
→ResNet-101/152: higher accuracy but heavier, use when compute is not a constraint
→ResNeXt: a grouped convolution variant with better accuracy/compute trade-off

ResNet in production today

ResNet-50 is still one of the most deployed image models in production globally, including in Australian industries like mining (equipment classification, site monitoring) and retail (product image classification, shelf analysis). Fine-tuning a pretrained ResNet-50 on domain-specific images typically takes hours on a single GPU and produces strong results with relatively small labelled datasets.

Residual connections beyond vision

The residual connection principle spread well beyond computer vision. Transformer architectures include skip connections around both the attention and feed-forward layers, which is a direct inheritance from ResNet. The insight that depth is trainable if you give gradients a shortcut path turned out to be one of the most important ideas in modern deep learning.

Why skip connections work

ResNet variants and when to use them

ResNet in production today

Residual connections beyond vision

Building something in this space? Let's talk.