ML Fundamentals
What is an LSTM (Long Short-Term Memory)? Sequential modelling explained
· 4 min read · By Jon Jovinsson
An LSTM is a recurrent neural network architecture that can learn dependencies across long sequences by using a gating mechanism to control what information is remembered, updated, or forgotten at each time step. Unlike vanilla RNNs, which struggle to retain information across many steps due to vanishing gradients, LSTMs maintain a cell state that carries relevant information across the full sequence.
The three gates
- →Forget gate: decides what to discard from the previous cell state
- →Input gate: decides what new information to write to the cell state
- →Output gate: decides what to pass to the next hidden state and the output
Where LSTMs are still useful
Transformers have overtaken LSTMs on most natural language tasks, but LSTMs remain useful for time-series forecasting where the sequence length is short, compute is constrained, or sequential structure is critical. For Australian businesses running demand forecasting, energy usage prediction, or financial time-series models on edge hardware or with limited budgets, LSTMs often outperform transformers on a cost-per-accuracy basis.
LSTM versus Transformer for time-series
On shorter sequences (under a few hundred steps), LSTMs are competitive with transformer-based forecasting models and much cheaper to train and serve. On longer sequences with complex long-range dependencies, temporal fusion transformers or Informer-style models tend to win. For most practical time-series forecasting problems in Australian retail, logistics, and resources, LSTMs are a reasonable baseline before investing in more complex architectures.