JDML
← All notes

ML Fundamentals

What is an MLP (Multi-Layer Perceptron)? The foundational neural network explained

· 4 min read · By

A Multi-Layer Perceptron (MLP) is a neural network made of stacked layers of neurons. Each neuron takes weighted inputs, applies an activation function (like ReLU or sigmoid), and passes the result to the next layer. During training, the weights are adjusted using backpropagation and gradient descent to minimise prediction error. It's the foundational building block that all modern deep learning builds on.

Architecture overview

  • Input layer: one neuron per feature in your dataset
  • Hidden layers: one or more layers that learn intermediate representations
  • Output layer: one neuron per class (classification) or a single value (regression)
  • Activation functions: ReLU is standard for hidden layers, softmax for multi-class output

When to use an MLP

MLPs work well on tabular data where features are pre-engineered and the relationships between them are not strongly spatial or sequential. Common use cases include customer churn prediction, fraud detection, lead scoring, and any classification or regression problem with structured features. For Australian businesses with CRM or transactional data, MLPs are often a good starting point before moving to gradient boosting or more complex architectures.

MLPs versus gradient boosting (e.g. CatBoost)

On tabular data with moderate size, gradient boosting methods like XGBoost, LightGBM, and CatBoost consistently outperform MLPs. MLPs shine when you have very large datasets, when you need differentiable representations for downstream models, or when you're embedding MLP layers inside a larger neural architecture. For most practical business prediction problems, try CatBoost first.

MLPs in JDML systems

We use MLP layers regularly as components inside larger models: as prediction heads on top of transformer encoders, as embedding layers in recommendation systems, and in hybrid architectures that combine tabular and sequential features. As a standalone model for business prediction tasks, we generally prefer gradient boosting, but the MLP foundation underpins almost everything.

Building something in this space? Let's talk.

We spend a lot of time with these tools. If you're trying to figure out which model fits your workload, we're happy to share what we've learned.

Get in touch