Notes

What we're thinking about.

Short, honest notes on software delivery, AI systems, local models, data pipelines, and the choices that actually matter once something has to work in production.

Latest·Web Scraping

Headless browsers for scraping: when to use Playwright, Puppeteer, or no browser at all

Reach for a headless browser only when the target site genuinely needs one. Modern JavaScript-rendered pages, anti-bot walls, and flows that depend on session state all justify the cost. Everything else is better served by plain HTTP. When you do need a browser, Playwright is our default in 2026, Puppeteer is a fine second choice, and Selenium is the compatibility option. Here is how we actually pick.

19 Apr 2026 · 7 min read

Read the post →

JDML

Filter

Web Scraping

Human in the loop scraping: when automation hits its ceiling

Some scraping problems cannot be fully automated. CAPTCHA walls, ambiguous extraction, consent screens, phone verification, payment flows, a...

19 Apr 2026 · 6 min read

→

Web Scraping

Cloud Run Jobs, Compute Engine, and Selenium: how we actually run scrapers in production

The three tools people actually reach for when running scrapers on Google Cloud are Cloud Run Jobs, Compute Engine, and a browser automation...

19 Apr 2026 · 7 min read

→

ML Engineering

When to use an MLP vs. a Transformer for tabular data

For most structured tabular problems, a well-tuned MLP (or a gradient-boosted tree) still beats a Transformer on accuracy, training time, an...

19 Apr 2026 · 6 min read

→

AI Tooling

Goose vs. GitHub Copilot: which agent actually ships code?

GitHub Copilot autocompletes. Block's Goose ships. That is the fastest honest summary of where these tools sit in 2026. Copilot lives inside...

19 Apr 2026 · 5 min read

→

AI Research

Karpathy's AutoResearch and what it could mean for neuroscience data

Andrej Karpathy's AutoResearch project points toward a future where AI agents run the grunt work of scientific investigation autonomously. I...

12 Apr 2026 · 4 min read

→

AI Tooling

Block's Goose: an open-source coding agent worth running locally

Block's open-source AI coding agent Goose has been picking up serious momentum. It runs as a local agent on your machine, connects to your t...

11 Apr 2026 · 3 min read

→

Local LLMs

Gemma's second act: not ready for agents, but worth watching

We spent a week running the new Gemma models through the same agent workloads we push to Claude every day. Short version: if you were hoping...

10 Apr 2026 · 4 min read

→

LLMs

What is a Large Language Model (LLM)? A practical guide for Australian businesses

A Large Language Model (LLM) is an AI system trained on massive amounts of text to predict, generate, and reason with language. They power t...

8 Apr 2026 · 5 min read

→

AI Agents

How AI agents work: a plain-English guide for Australian businesses

An AI agent is a system that uses a language model as its reasoning engine, gives it access to tools (APIs, databases, browsers), and lets i...

7 Apr 2026 · 5 min read

→

AI Agents

What is RAG (Retrieval-Augmented Generation) and when should you use it?

RAG (Retrieval-Augmented Generation) is a technique that gives an LLM access to a knowledge base at query time, so it can answer questions u...

6 Apr 2026 · 5 min read

→

ML Fundamentals

What is an MLP (Multi-Layer Perceptron)? The foundational neural network explained

A Multi-Layer Perceptron (MLP) is the simplest form of neural network: layers of neurons connected by weights, trained to map inputs to outp...

5 Apr 2026 · 4 min read

→

ML Fundamentals

What is a CNN (Convolutional Neural Network)? How convolutions learn features

A Convolutional Neural Network (CNN) is a neural network architecture designed to process grid-structured data like images by learning local...

4 Apr 2026 · 4 min read

→

ML Fundamentals

Transformer architecture explained: the model behind every modern LLM

The transformer is the neural network architecture that powers GPT, Claude, Gemini, and every major LLM. Introduced in the 2017 paper 'Atten...

3 Apr 2026 · 5 min read

→

ML Fundamentals

What is an LSTM (Long Short-Term Memory)? Sequential modelling explained

An LSTM (Long Short-Term Memory) is a type of recurrent neural network designed to learn patterns in sequential data over long time spans. B...

2 Apr 2026 · 4 min read

→

ML Fundamentals

What is a BiLSTM (Bidirectional LSTM) and when does bidirectionality matter?

A BiLSTM (Bidirectional LSTM) runs two LSTM layers over the same sequence: one forward (left to right) and one backward (right to left). The...

1 Apr 2026 · 3 min read

→

ML Fundamentals

What is ResNet (Residual Network) and what did residual connections solve?

ResNet (Residual Network) introduced skip connections that allow gradients to flow directly through deep networks, solving the vanishing gra...

31 Mar 2026 · 4 min read

→

ML Fundamentals

What is CatBoost and why is it still the go-to for tabular data in production?

CatBoost is a gradient boosting library developed by Yandex that handles categorical features natively, trains fast, and consistently outper...

30 Mar 2026 · 4 min read

→