Prompt Playbook: AI Fundamentals PART 2

Prompt Playbook: AI Fundamentals

Kyle Balmer
April 22, 2025

Hey Prompt Entrepreneur,

A common comment I get on my TikTok videos is about the "strawberry problem" – where ChatGPT can’t consistently tell you how many Rs are in the word "strawberry."

“If it’s so smart how many Rs in strawberry?” - “It can’t even count the Rs in strawberry”

It’s a common criticism thrown at ChatGPT and other modern (LLM-based) AIs.

The criticism actually reveals a LOT about people’s misunderstandings about AI.

We're mocking an AI language model for struggling with counting... when we already have perfect tools for counting– they're called computers. Computing is literally what they do. That's like criticising a hammer because it's not good at cutting wood.

Instead LLMs are Large Language Models. The hint is in Language. They aren’t built for maths, but for language.

Understanding this distinction is crucial for entrepreneurs – if you don't understand what LLMs fundamentally are, you'll either expect too much from them or miss their true potential entirely.

In this Part we'll demystify how large language models actually work under the hood – without requiring a PhD in computer science.

Let’s get started:

Summary

Large Language Models

The basic idea: predicting the next word
Learning from vast amounts of text
The training process and parameters
Turning words into numbers: embeddings
Understanding context: transformers and attention

The Prediction Game

At their core, large language models do one thing remarkably well: they predict what word should come next in a piece of text.

That's kinda it. Really.

It sounds almost disappointingly simple, but this seemingly basic capability leads to all the "magic" we see.

Here's a concrete example of how this prediction works:

Imagine I start typing: "The Eiffel Tower is located in..."

You, as a human, can probably predict the next word: "Paris."

LLMs do exactly this, but at enormous scale and with extraordinary sophistication. This next-word prediction, when done billions of times with the right training, does something special…

From Prediction to Understanding

Here's where it gets interesting. Humans intuitively understand that language prediction requires knowledge. To predict that "The Eiffel Tower is located in Paris," you need to know facts about world geography, famous landmarks, French culture etc. etc.

LLMs learn this knowledge through exposure – massive exposure. They're trained on hundreds of billions of words from books, articles, websites, and other texts. It would take a human thousands of years to read this much text.

This process is similar to how a child learns language by being constantly exposed to it, but at a vastly accelerated pace and scale. Through this exposure, the model learns patterns and relationships between words, phrases, and concepts.

Unlike humans who learn through a mix of experiences, explicit instruction, and text, LLMs learn purely through text. This is why they can sound remarkably human-like in some contexts but fail at seemingly simple tasks in others. They haven't experienced the world; they've only read about it. They have a lot of information but not necessarily the context to place it in.

The Prediction Engine: How It Works

So how does this prediction engine actually work? It's all about pattern recognition at a massive scale.

During training, the LLM learns by repeatedly trying to predict the next word in a given text. Here's how it works:

The model is given a piece of text with the last word hidden
It tries to guess what that word should be
It compares its guess with the actual word
It slightly adjusts its internal settings (called parameters or weights)
This process repeats billions of times

Like a student running questions and then checking the answers and using this to give better answers next time. But at a tremendous scale.

These parameters/weights are values that determine how the model makes predictions – think of it like tuning millions of dials on a complex machine. Modern LLMs have hundreds of billions of these parameters.

This adjustment process uses something called "gradient descent" – a fancy term for a simple idea. The model figures out which direction to turn each of those millions of dials to get a slightly better prediction next time. It's like playing the hot-and-cold game, where the model gets feedback on whether it's getting "warmer" or "colder" and adjusts accordingly.

Prediction with Uncertainty

Human language prediction isn't deterministic – we consider multiple possibilities with different likelihoods. We’d be very boring otherwise!

LLMs do the same thing.

When an LLM predicts the next word, it doesn't just pick one word with certainty. Instead, it assigns a probability to all possible next words in its vocabulary.

For example, after "The Eiffel Tower is located in..." the model might assign:

"Paris" → 95% probability
"the" → 1% probability
"France" → 3% probability
And so on for thousands of other words

When generating text, the model typically doesn't just pick the highest probability word every time (which would make outputs very predictable). Instead, it uses controlled randomness to occasionally select less likely words, making the text more diverse and human-like.

Note: this is also why ChatGPT isn’t “just” autocorrect.

This probabilistic nature explains why you get different responses from ChatGPT when asking the same question multiple times.

Most computer programmes are deterministic. The same input leads to the same output. Not so with LLMs. They are instead probabilistic. The same input leads to different outputs.

This simple fact is both a tremendous strength and weakness of LLMs. It makes them "creative" but also fickle. Like, well, humans.

The Building Blocks of Prediction

To make this prediction game work computationally, we need to break down language into manageable pieces and represent them in a way computers can process.

From Words to Tokens

I've been talking about "words" so far, but that's not exactly right. Let’s clear this up a bit and introduce the concept of tokens.

Tokens are the fundamental units in the prediction game. A token can be a whole word, part of a word, a character, or even punctuation. For English text, a token is roughly 3/4 of a word on average.

For example, "Let's understand how AI works" might become: ["Let", "'s", " understand", " how", " AI", " works"]

This tokenisation is crucial because it defines what exactly the model is predicting. It's not always predicting whole words – sometimes it's predicting word fragments or even individual characters.

The Language Map: Embeddings

The next crucial step is creating a "map" of language that captures relationships between words and concepts. This is done through "embeddings."

Imagine a massive multi-dimensional space where every word in the language has a specific location. Words with similar meanings or that are used in similar contexts are positioned near each other in this space. "King" would be near "queen" and "ruler," but far from "bicycle." Continue on in this way for ALL words.

This embedding space is what allows LLMs to understand that:

"Happy" is more similar to "joyful" than to "melancholy"
"Capital" near "country" likely refers to cities, not money
"Bank" near "river" means something different than "bank" near "money"

When the model predicts the next token, it's essentially navigating this semantic space, looking for tokens that make sense in the current context.

Importantly we don’t actually have to teach all of these relationships. That is the fools errand that the symbolic AI practitioners pursued. Imagine trying to formally declare the relationship between “bed” and “Arkansas”. And writing down similar relationships between all pairs of all the words. Yeah…nah…

Beyond Individual Words: The Attention Mechanism

Our modern LLMs actually go a step further still. Good prediction requires more than just understanding individual words – it requires understanding how all the words in a sentence relate to each other. This is where the breakthrough "attention mechanism" comes in.

Earlier approaches processed text sequentially, one word at a time. The attention mechanism instead lets the model see relationships between all words simultaneously.

It's like the difference between:

Reading a sentence word by word, trying to remember what came before
Reading the whole sentence at once and seeing how all the words relate to each other

When processing "The athlete picked up her trophy because she won the championship," the attention mechanism helps determine that "she" refers to the athlete by measuring the strength of relationships between all pairs of words.

This ability to juggle complex relationships between words is what makes modern LLMs so much better at maintaining coherence and understanding context than earlier models.

This breakthrough only came in 2017 and was fundamental in the rise of our modern LLMs. Here the wikipedia page on the critical paper.

When Prediction Becomes Intelligence

The most fascinating aspect of this prediction system is what happens when we scale it up massively – both in terms of the amount of training data and the size of the model itself.

When we reach hundreds of billions of parameters trained on vast datasets, something remarkable happens. The model doesn't just learn simple word associations – it starts to capture complex patterns related to:

Facts about the world
Reasoning chains
Cultural references
Logical inference
Common sense knowledge
And much more

None of these capabilities were explicitly programmed. This is so important.

They instead emerged naturally from the prediction task when done at sufficient scale. This phenomenon, where new abilities suddenly appear as models grow larger, is called "emergent behaviour."

Prediction, when scaled to this level, begins to simulate many aspects of what we might call understanding or (gulp) intelligence.

What This Means for Your Business

Understanding language models as sophisticated prediction engines helps you make better decisions about implementing AI in your business:

1. Play to Their Strengths: LLMs excel at tasks that are fundamentally about pattern recognition in language – content creation, summarisation, translation, etc.

2. Compensate for Weaknesses: For tasks requiring factual precision, mathematics or logical reasoning, supplement LLMs with other tools or human oversight.

3. Design Better Prompts: When you understand that you're guiding a prediction engine, you can craft prompts that lead to better predictions

4. Set Realistic Expectations: Knowing that even the most impressive AI capabilities are built on prediction helps set appropriate boundaries for what these systems can and cannot do.

Perhaps most importantly, understanding LLMs as prediction engines removes some of the mystique and lets you approach them pragmatically – as powerful tools with specific capabilities and limitations.

Try This With Your AI Tutor

Want to explore these concepts further? Try this prompt with your AI tutor:

I want to understand how viewing LLMs as prediction engines affects how I should use them in my business. Can you:
1. Identify 3 tasks in [my industry] that would be well-suited for LLMs because they fundamentally involve pattern prediction
2. Identify 3 tasks that would be poorly suited because they go beyond pattern prediction
3. Suggest how I might combine LLMs with other tools to overcome these limitations

Additional Resources

If you're interested in learning more about how LLMs work, these resources provide more depth while remaining accessible:

"What Is ChatGPT Doing... and Why Does It Work?" by Stephen Wolfram (Link): A detailed but accessible explanation of the mechanics behind large language models.

"But what is a neural network?" by 3Blue1Brown (Link): An excellent visual explanation of neural networks with stunning animations. 7 minutes.

"Neural Networks: Zero to Hero" by Andrej Karpathy (Link): A comprehensive 3-hour video course on neural networks by one of the field's leading experts.

What's Next?

Next we'll explore the difference between training and inference in AI systems. We'll unpack why creating these models is so expensive but using them is relatively affordable, and what this means for your AI strategy. We'll also tackle the thorny issue of "hallucinations"—why these models sometimes generate convincing but incorrect information, and how you can safeguard against this in your applications.

Keep Prompting,

Kyle

When you are ready

AI Entrepreneurship programmes to get you started in AI:

70+ AI Business Courses
✓ Instantly unlock 70+ AI Business courses ✓ Get FUTURE courses for Free ✓ Kyle’s personal Prompt Library ✓ AI Business Starter Pack Course ✓ AI Niche Navigator Course → Get Premium

AI Workshop Kit
Deliver AI Workshops and Presentations to Businesses with my Field Tested AI Workshop Kit → Learn More

AI Authority Accelerator
Do you want to become THE trusted AI Voice in your industry in 30-days? → Learn More

AI Automation Accelerator
Do you want to build your first AI Automation product in 30-days? → Enrol Now

Anything else? Hit reply to this email and let’s chat.

If you feel this — learning how to use AI in entrepreneurship and work — is not for you → Unsubscribe here.