JDML
← All notes

ML Fundamentals

What is an LSTM (Long Short-Term Memory)? Sequential modelling explained

· 4 min read · By

An LSTM is a recurrent neural network architecture that can learn dependencies across long sequences by using a gating mechanism to control what information is remembered, updated, or forgotten at each time step. Unlike vanilla RNNs, which struggle to retain information across many steps due to vanishing gradients, LSTMs maintain a cell state that carries relevant information across the full sequence.

The three gates

  • Forget gate: decides what to discard from the previous cell state
  • Input gate: decides what new information to write to the cell state
  • Output gate: decides what to pass to the next hidden state and the output

Where LSTMs are still useful

Transformers have overtaken LSTMs on most natural language tasks, but LSTMs remain useful for time-series forecasting where the sequence length is short, compute is constrained, or sequential structure is critical. For Australian businesses running demand forecasting, energy usage prediction, or financial time-series models on edge hardware or with limited budgets, LSTMs often outperform transformers on a cost-per-accuracy basis.

LSTM versus Transformer for time-series

On shorter sequences (under a few hundred steps), LSTMs are competitive with transformer-based forecasting models and much cheaper to train and serve. On longer sequences with complex long-range dependencies, temporal fusion transformers or Informer-style models tend to win. For most practical time-series forecasting problems in Australian retail, logistics, and resources, LSTMs are a reasonable baseline before investing in more complex architectures.

Building something in this space? Let's talk.

We spend a lot of time with these tools. If you're trying to figure out which model fits your workload, we're happy to share what we've learned.

Get in touch