ML Fundamentals
What is a CNN (Convolutional Neural Network)? How convolutions learn features
· 4 min read · By Jon Jovinsson
A Convolutional Neural Network (CNN) learns to detect patterns in grid-structured data by sliding small filters over the input and computing dot products at each position. These filters learn to detect edges, textures, and higher-level features like shapes automatically during training. The key insight is weight sharing: the same filter is applied everywhere in the image, which makes CNNs efficient and translation-invariant.
Key components of a CNN
- →Convolutional layers: apply learned filters to detect local patterns
- →Pooling layers: downsample spatial dimensions to reduce computation
- →Activation functions: ReLU after each convolution to introduce non-linearity
- →Fully connected layers: classify based on the learned feature maps
- →Batch normalisation: stabilises training and accelerates convergence
Where CNNs are used in practice
Image classification, object detection, and image segmentation are the obvious domains. Beyond computer vision, CNNs are used for audio processing (spectrograms are treated as images), time-series data with local patterns, and document layout analysis. In the Australian context, we've used CNN-based models for satellite imagery analysis relevant to mining site monitoring, property inspection image processing, and document classification in legal and financial workflows.
CNNs versus Vision Transformers
Vision Transformers (ViT) now match or exceed CNNs on large-scale image benchmarks, but CNNs remain more efficient on smaller datasets and when compute is limited. For most practical image classification tasks at a business scale, a fine-tuned CNN (EfficientNet, ResNet) outperforms a ViT in training cost and inference speed. Use ViT when you have very large data and can afford the training compute.