<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Ajay Walia</title><link>https://curiousbit.netlify.app/</link><description>Digital workplace, artificial intelligence, cloud, security, automation, and enterprise technology notes by Ajay Walia.</description><language>en-au</language><managingEditor>Ajay Walia</managingEditor><webMaster>Ajay Walia</webMaster><copyright>Copyright 2026 Ajay Walia</copyright><lastBuildDate>Sun, 21 Jun 2026 05:46:10 +0000</lastBuildDate><atom:link href="https://curiousbit.netlify.app/tags/machine-learning/index.xml" rel="self" type="application/rss+xml"/><image><url>https://curiousbit.netlify.app/images/og-default.png</url><title>Ajay Walia</title><link>https://curiousbit.netlify.app/</link></image><item><title>LLMs Are Probability Engines, Not "Thinkers"</title><link>https://curiousbit.netlify.app/llms-are-probability-engines-not-ai/</link><guid isPermaLink="true">https://curiousbit.netlify.app/llms-are-probability-engines-not-ai/</guid><pubDate>Sun, 07 Jun 2026 00:00:00 +0000</pubDate><dc:creator>Ajay Walia</dc:creator><description>&lt;style&gt;
@import url('https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&amp;family=JetBrains+Mono:wght@400;500&amp;display=swap');
.pe-article {
--bg: #070b14;
--bg2: #0d1423;
--bg3: #111827;
--cyan: #00e5ff;
--purple: #a855f7;
--gold: #fbbf24;
--text: #e2e8f0;
--muted: #94a3b8;
--border: #1e293b;
--danger: #f87171;
font-family: 'Space Grotesk', system-ui, sans-serif;
font-size: 1.08rem;
line-height: 1.85;
color: var(--text);
}
/* TOC */
.pe-toc {
background: var(--bg2);
border: 1px solid var(--border);
border-left: 3px solid var(--cyan);
border-radius: 10px;
padding: 1.25rem 1.75rem;
margin: 2rem 0;
}
.pe-toc h3 {
font-size: 0.95rem;
letter-spacing: 0.18em;
text-transform: uppercase;
color: var(--cyan);
margin: 0 0 1rem;
}
.pe-toc ol { padding-left: 1.3rem; margin: 0; }
.pe-toc li { margin-bottom: 0.6rem; }
.pe-toc a { color: var(--muted); text-decoration: none; font-size: 1.15rem; font-weight: 600; transition: color 0.2s; }
.pe-toc a:hover { color: var(--cyan); }
/* Video */
.pe-video { margin: 2rem 0; border-radius: 12px; overflow: hidden; border: 1px solid var(--border); background: #000; }
.pe-video video { width: 100%; display: block; }
.pe-video-header {
background: var(--bg2);
padding: 1rem 1.4rem;
font-size: 1.15rem;
font-weight: 600;
color: var(--cyan);
border-bottom: 1px solid var(--border);
line-height: 1.5;
}
/* Typography */
.pe-article h2 {
font-size: 1.75rem;
font-weight: 700;
color: #fff;
margin: 3rem 0 0.9rem;
padding-bottom: 0.45rem;
border-bottom: 1px solid var(--border);
}
.pe-sec-num { color: var(--cyan); font-size: 1rem; font-weight: 600; display: block; margin-bottom: 0.2rem; letter-spacing: 0.1em; }
.pe-article p { margin-bottom: 1.1rem; }
.pe-article strong { color: #fff; }
.pe-em { color: var(--gold); }
/* Callouts */
.pe-callout { background: var(--bg2); border-left: 4px solid var(--purple); border-radius: 0 8px 8px 0; padding: 1.4rem 1.8rem; margin: 1.5rem 0; font-size: 1.4rem; color: var(--muted); line-height: 1.75; }
.pe-callout.cy { border-color: var(--cyan); }
.pe-callout.gd { border-color: var(--gold); }
.pe-callout strong { color: var(--text); }
/* Compare table */
.pe-table { width: 100%; border-collapse: collapse; font-size: 1rem; margin: 1.25rem 0; }
.pe-table th { text-align: left; padding: 0.7rem 1rem; background: var(--bg2); color: var(--cyan); font-size: 0.85rem; letter-spacing: 0.08em; text-transform: uppercase; border-bottom: 1px solid var(--border); }
.pe-table td { padding: 0.85rem 1rem; border-bottom: 1px solid var(--border); color: var(--muted); vertical-align: top; line-height: 1.6; }
.pe-table td:first-child { color: var(--text); font-weight: 500; }
.pe-table tr:hover td { background: var(--bg2); }
/* Formula boxes */
.pe-box { background: var(--bg2); border: 1px solid var(--border); border-radius: 12px; padding: 1.75rem; margin: 1.75rem 0; }
.pe-box-title { font-size: 0.95rem; letter-spacing: 0.15em; text-transform: uppercase; color: var(--purple); margin-bottom: 1rem; }
/* Anim 1 — token prediction */
.pe-sentence { font-size: 1.3rem; font-family: 'JetBrains Mono', monospace; color: var(--text); min-height: 2rem; margin-bottom: 1.1rem; }
.pe-cursor { display: inline-block; width: 2px; height: 1em; background: var(--cyan); animation: pe-blink 0.8s infinite; vertical-align: middle; margin-left: 2px; }
@keyframes pe-blink { 0%,100%{opacity:1} 50%{opacity:0} }
.pe-prob-bars { display: flex; flex-direction: column; gap: 0.55rem; }
.pe-prob-row { display: flex; align-items: center; gap: 0.8rem; font-size: 1.1rem; }
.pe-prob-lbl { width: 80px; text-align: right; color: var(--muted); font-family: 'JetBrains Mono', monospace; flex-shrink: 0; }
.pe-prob-track { flex: 1; height: 26px; background: var(--bg3); border-radius: 5px; overflow: hidden; }
.pe-prob-fill { height: 100%; background: var(--cyan); border-radius: 5px; transition: width 0.55s cubic-bezier(0.4,0,0.2,1); width: 0; }
.pe-prob-fill.win { background: var(--gold); }
.pe-prob-pct { width: 50px; font-family: 'JetBrains Mono', monospace; font-size: 1rem; color: var(--muted); }
/* Math display */
.pe-math { font-family: 'Georgia', serif; font-size: 1.45rem; color: var(--gold); text-align: center; padding: 1.3rem; background: var(--bg3); border-radius: 8px; margin-bottom: 0.9rem; }
.pe-term { display: inline; opacity: 0; transition: opacity 0.4s; cursor: help; position: relative; }
.pe-term.on { opacity: 1; }
.pe-term:hover::after { content: attr(data-tip); position: absolute; bottom: 115%; left: 50%; transform: translateX(-50%); background: var(--bg); border: 1px solid var(--purple); color: var(--text); padding: 0.4rem 0.85rem; border-radius: 6px; font-family: 'Space Grotesk', sans-serif; font-size: 0.88rem; white-space: nowrap; z-index: 20; }
.pe-anns { display: grid; grid-template-columns: 1fr 1fr; gap: 0.6rem; margin-top: 0.8rem; }
.pe-ann { background: var(--bg3); border-radius: 6px; padding: 0.55rem 0.85rem; font-size: 0.93rem; opacity: 0; transition: opacity 0.5s; }
.pe-ann.on { opacity: 1; }
.pe-ann-sym { color: var(--gold); font-family: 'JetBrains Mono', monospace; font-weight: bold; }
.pe-ann-desc { color: var(--muted); }
/* Softmax */
.pe-sm-demo { display: flex; gap: 1rem; align-items: flex-start; flex-wrap: wrap; }
.pe-sm-col { flex: 1; min-width: 160px; }
.pe-col-lbl { font-size: 0.82rem; letter-spacing: 0.1em; text-transform: uppercase; color: var(--muted); margin-bottom: 0.75rem; }
.pe-logit-row { display: flex; align-items: center; gap: 0.6rem; margin-bottom: 0.55rem; font-size: 0.97rem; font-family: 'JetBrains Mono', monospace; }
.pe-logit-w { width: 60px; color: var(--text); }
.pe-logit-v { padding: 0.22rem 0.6rem; border-radius: 4px; font-size: 0.92rem; }
.pe-logit-v.neg { background: rgba(248,113,113,0.15); color: var(--danger); }
.pe-logit-v.pos { background: rgba(0,229,255,0.1); color: var(--cyan); }
.pe-sm-bar { height: 22px; border-radius: 4px; background: var(--purple); transition: width 0.75s cubic-bezier(0.4,0,0.2,1); width: 0; display: flex; align-items: center; padding-left: 7px; font-size: 0.86rem; color: #fff; overflow: hidden; white-space: nowrap; }
.pe-arrow { display: flex; align-items: center; justify-content: center; padding-top: 1.4rem; font-size: 1.6rem; color: var(--cyan); }
/* Loss */
.pe-loss-wrap { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; align-items: start; }
@media(max-width:500px) { .pe-loss-wrap { grid-template-columns: 1fr; } }
.pe-loss-num { font-size: 2.8rem; font-weight: 800; font-family: 'JetBrains Mono', monospace; color: var(--danger); transition: color 0.5s; line-height: 1; }
.pe-loss-num.good { color: #4ade80; }
.pe-loss-lbl { font-size: 0.88rem; color: var(--muted); margin-top: 0.3rem; }
.pe-loss-slider label { font-size: 0.92rem; color: var(--muted); display: block; margin: 0.8rem 0 0.3rem; }
input[type=range] { width: 100%; accent-color: var(--cyan); }
.pe-loss-formula { background: var(--bg3); border-radius: 8px; padding: 1.1rem; font-family: 'JetBrains Mono', monospace; font-size: 1rem; color: var(--text); line-height: 2.1; }
.pe-lf-hl { color: var(--gold); }
.pe-lf-res { color: var(--cyan); font-weight: bold; }
/* Attention */
.pe-attn-words { display: flex; gap: 0.5rem; flex-wrap: wrap; margin-bottom: 0.9rem; }
.pe-attn-word { padding: 0.4rem 0.8rem; border-radius: 6px; background: var(--bg3); border: 1px solid var(--border); font-size: 1rem; cursor: pointer; transition: all 0.2s; user-select: none; }
.pe-attn-word:hover { border-color: var(--cyan); }
.pe-attn-word.sel { background: rgba(0,229,255,0.12); border-color: var(--cyan); color: var(--cyan); }
/* Attention formula cards */
.pe-qkv { display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 0.6rem; margin-top: 0.8rem; }
@media(max-width:480px) { .pe-qkv { grid-template-columns: 1fr; } }
.pe-qkv-card { border-radius: 6px; padding: 0.75rem 0.9rem; font-size: 0.93rem; }
/* Temperature */
.pe-temp-row-ctrl { display: flex; align-items: center; gap: 1rem; }
.pe-temp-big { font-size: 1.6rem; font-family: 'JetBrains Mono', monospace; font-weight: 700; color: var(--cyan); width: 56px; flex-shrink: 0; }
.pe-temp-lbl { font-size: 0.92rem; color: var(--muted); margin-top: 0.2rem; }
.pe-tbars { display: flex; flex-direction: column; gap: 0.45rem; margin-top: 0.9rem; }
.pe-trow { display: flex; align-items: center; gap: 0.65rem; font-size: 0.95rem; font-family: 'JetBrains Mono', monospace; }
.pe-tlbl { width: 64px; color: var(--muted); text-align: right; flex-shrink: 0; }
.pe-ttrack { flex: 1; height: 19px; background: var(--bg3); border-radius: 4px; overflow: hidden; }
.pe-tfill { height: 100%; border-radius: 4px; background: var(--purple); transition: width 0.5s cubic-bezier(0.4,0,0.2,1); }
.pe-tpct { width: 48px; text-align: right; color: var(--muted); }
/* Limits */
.pe-limits { display: grid; grid-template-columns: repeat(3, 1fr); gap: 1rem; margin: 1.5rem 0; }
@media(max-width:640px) { .pe-limits { grid-template-columns: 1fr; } }
.pe-limit-card { background: var(--bg2); border: 1px solid var(--border); border-radius: 12px; padding: 1.5rem 1.6rem; transition: border-color 0.2s; display: flex; flex-direction: column; gap: 0.4rem; }
.pe-limit-card:hover { border-color: var(--purple); }
.pe-limit-icon { font-size: 2.2rem; line-height: 1; }
.pe-limit-title { font-weight: 700; color: var(--text); font-size: 1.35rem; margin: 0; }
.pe-limit-desc { font-size: 1.15rem; color: var(--muted); line-height: 1.65; margin: 0; }
/* Dual meters */
.pe-meters { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin: 1.25rem 0; }
.pe-meter { border-radius: 8px; padding: 1.1rem; text-align: center; }
.pe-meter-lbl { font-size: 0.82rem; letter-spacing: 0.1em; text-transform: uppercase; margin-bottom: 0.4rem; }
.pe-meter-val { font-size: 2.2rem; font-weight: 800; }
/* Buttons */
.pe-btn {
margin-top: 0.9rem;
background: var(--bg3);
border: 1px solid var(--cyan);
color: var(--cyan);
padding: 0.4rem 1.1rem;
border-radius: 6px;
cursor: pointer;
font-size: 0.82rem;
font-family: 'Space Grotesk', sans-serif;
transition: background 0.2s;
}
.pe-btn:hover { background: rgba(0,229,255,0.1); }
.pe-btn-pur { border-color: var(--purple); color: var(--purple); }
.pe-btn-pur:hover { background: rgba(168,85,247,0.1); }
/* Interactive badge */
.pe-interactive-header {
display: flex;
align-items: center;
justify-content: space-between;
margin-bottom: 1rem;
}
.pe-interactive-header .pe-box-title { margin-bottom: 0; }
.pe-interactive-badge {
display: inline-flex;
align-items: center;
gap: 0.4rem;
background: rgba(0,229,255,0.08);
border: 1px solid var(--cyan);
color: var(--cyan);
font-size: 0.72rem;
font-weight: 700;
letter-spacing: 0.12em;
text-transform: uppercase;
padding: 0.3rem 0.75rem;
border-radius: 999px;
flex-shrink: 0;
}
.pe-interactive-badge::before {
content: '';
width: 7px;
height: 7px;
border-radius: 50%;
background: var(--cyan);
animation: pe-pulse 1.6s ease-in-out infinite;
flex-shrink: 0;
}
@keyframes pe-pulse {
0%, 100% { opacity: 1; transform: scale(1); }
50% { opacity: 0.4; transform: scale(0.7); }
}
.pe-interact-hint {
display: flex;
align-items: center;
gap: 0.6rem;
margin-top: 0.9rem;
padding: 0.85rem 1.1rem;
background: rgba(0,229,255,0.05);
border: 1px dashed rgba(0,229,255,0.25);
border-radius: 8px;
font-size: 1.05rem;
color: var(--muted);
}
.pe-interact-hint span { font-size: 1.25rem; }
.pe-divider { border: none; border-top: 1px solid var(--border); margin: 2.5rem 0; }
@media(max-width:520px) {
.pe-anns { grid-template-columns: 1fr; }
.pe-sm-demo { flex-direction: column; }
.pe-arrow { transform: rotate(90deg); }
}
&lt;/style&gt;
&lt;div class="pe-article"&gt;
&lt;div class="pe-video"&gt;
&lt;div class="pe-video-header"&gt;▶ Full Video Explainer — covering how LLMs work, from next-token prediction to attention, training, and why hallucinations are inevitable&lt;/div&gt;
&lt;video controls poster="/images/llms-are-pe/hero.jpg"&gt;
&lt;source src="https://curiousbit.netlify.app/images/llms-are-pe/explainer.mp4" type="video/mp4" /&gt;
Your browser doesn't support HTML5 video.
&lt;/video&gt;
&lt;/div&gt;
&lt;nav class="pe-toc"&gt;
&lt;h3&gt;In this article&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="#pe-s1"&gt;What ChatGPT and Claude actually are&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s2"&gt;The one job every LLM does&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s3"&gt;The probability formula (interactive)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s4"&gt;Softmax: turning scores into probabilities&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s5"&gt;How it learns: cross-entropy loss&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s6"&gt;The Transformer architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s7"&gt;Self-attention: every word watches every word&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s8"&gt;How text is actually generated&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s9"&gt;Temperature: controlling randomness&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s10"&gt;Why it sometimes lies (hallucinations)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s11"&gt;Key limitations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#pe-s12"&gt;What's next&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/nav&gt;
&lt;p&gt;You've used ChatGPT. You've heard the word "AI" a thousand times this year. But here's something almost nobody explains clearly: the thing powering these tools is &lt;strong&gt;not intelligent in any human sense&lt;/strong&gt;. It doesn't think. It doesn't understand. It doesn't have goals.&lt;/p&gt;</description><content:encoded>&lt;![CDATA[<img src="https://curiousbit.netlify.app/images/llms-are-pe/hero.jpg" alt="Machine-Learning" style="max-width:100%;height:auto;margin-bottom:1.5em;"/><style>
@import url('https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap');
.pe-article {
--bg: #070b14;
--bg2: #0d1423;
--bg3: #111827;
--cyan: #00e5ff;
--purple: #a855f7;
--gold: #fbbf24;
--text: #e2e8f0;
--muted: #94a3b8;
--border: #1e293b;
--danger: #f87171;
font-family: 'Space Grotesk', system-ui, sans-serif;
font-size: 1.08rem;
line-height: 1.85;
color: var(--text);
}
/* TOC */
.pe-toc {
background: var(--bg2);
border: 1px solid var(--border);
border-left: 3px solid var(--cyan);
border-radius: 10px;
padding: 1.25rem 1.75rem;
margin: 2rem 0;
}
.pe-toc h3 {
font-size: 0.95rem;
letter-spacing: 0.18em;
text-transform: uppercase;
color: var(--cyan);
margin: 0 0 1rem;
}
.pe-toc ol { padding-left: 1.3rem; margin: 0; }
.pe-toc li { margin-bottom: 0.6rem; }
.pe-toc a { color: var(--muted); text-decoration: none; font-size: 1.15rem; font-weight: 600; transition: color 0.2s; }
.pe-toc a:hover { color: var(--cyan); }
/* Video */
.pe-video { margin: 2rem 0; border-radius: 12px; overflow: hidden; border: 1px solid var(--border); background: #000; }
.pe-video video { width: 100%; display: block; }
.pe-video-header {
background: var(--bg2);
padding: 1rem 1.4rem;
font-size: 1.15rem;
font-weight: 600;
color: var(--cyan);
border-bottom: 1px solid var(--border);
line-height: 1.5;
}
/* Typography */
.pe-article h2 {
font-size: 1.75rem;
font-weight: 700;
color: #fff;
margin: 3rem 0 0.9rem;
padding-bottom: 0.45rem;
border-bottom: 1px solid var(--border);
}
.pe-sec-num { color: var(--cyan); font-size: 1rem; font-weight: 600; display: block; margin-bottom: 0.2rem; letter-spacing: 0.1em; }
.pe-article p { margin-bottom: 1.1rem; }
.pe-article strong { color: #fff; }
.pe-em { color: var(--gold); }
/* Callouts */
.pe-callout { background: var(--bg2); border-left: 4px solid var(--purple); border-radius: 0 8px 8px 0; padding: 1.4rem 1.8rem; margin: 1.5rem 0; font-size: 1.4rem; color: var(--muted); line-height: 1.75; }
.pe-callout.cy { border-color: var(--cyan); }
.pe-callout.gd { border-color: var(--gold); }
.pe-callout strong { color: var(--text); }
/* Compare table */
.pe-table { width: 100%; border-collapse: collapse; font-size: 1rem; margin: 1.25rem 0; }
.pe-table th { text-align: left; padding: 0.7rem 1rem; background: var(--bg2); color: var(--cyan); font-size: 0.85rem; letter-spacing: 0.08em; text-transform: uppercase; border-bottom: 1px solid var(--border); }
.pe-table td { padding: 0.85rem 1rem; border-bottom: 1px solid var(--border); color: var(--muted); vertical-align: top; line-height: 1.6; }
.pe-table td:first-child { color: var(--text); font-weight: 500; }
.pe-table tr:hover td { background: var(--bg2); }
/* Formula boxes */
.pe-box { background: var(--bg2); border: 1px solid var(--border); border-radius: 12px; padding: 1.75rem; margin: 1.75rem 0; }
.pe-box-title { font-size: 0.95rem; letter-spacing: 0.15em; text-transform: uppercase; color: var(--purple); margin-bottom: 1rem; }
/* Anim 1 — token prediction */
.pe-sentence { font-size: 1.3rem; font-family: 'JetBrains Mono', monospace; color: var(--text); min-height: 2rem; margin-bottom: 1.1rem; }
.pe-cursor { display: inline-block; width: 2px; height: 1em; background: var(--cyan); animation: pe-blink 0.8s infinite; vertical-align: middle; margin-left: 2px; }
@keyframes pe-blink { 0%,100%{opacity:1} 50%{opacity:0} }
.pe-prob-bars { display: flex; flex-direction: column; gap: 0.55rem; }
.pe-prob-row { display: flex; align-items: center; gap: 0.8rem; font-size: 1.1rem; }
.pe-prob-lbl { width: 80px; text-align: right; color: var(--muted); font-family: 'JetBrains Mono', monospace; flex-shrink: 0; }
.pe-prob-track { flex: 1; height: 26px; background: var(--bg3); border-radius: 5px; overflow: hidden; }
.pe-prob-fill { height: 100%; background: var(--cyan); border-radius: 5px; transition: width 0.55s cubic-bezier(0.4,0,0.2,1); width: 0; }
.pe-prob-fill.win { background: var(--gold); }
.pe-prob-pct { width: 50px; font-family: 'JetBrains Mono', monospace; font-size: 1rem; color: var(--muted); }
/* Math display */
.pe-math { font-family: 'Georgia', serif; font-size: 1.45rem; color: var(--gold); text-align: center; padding: 1.3rem; background: var(--bg3); border-radius: 8px; margin-bottom: 0.9rem; }
.pe-term { display: inline; opacity: 0; transition: opacity 0.4s; cursor: help; position: relative; }
.pe-term.on { opacity: 1; }
.pe-term:hover::after { content: attr(data-tip); position: absolute; bottom: 115%; left: 50%; transform: translateX(-50%); background: var(--bg); border: 1px solid var(--purple); color: var(--text); padding: 0.4rem 0.85rem; border-radius: 6px; font-family: 'Space Grotesk', sans-serif; font-size: 0.88rem; white-space: nowrap; z-index: 20; }
.pe-anns { display: grid; grid-template-columns: 1fr 1fr; gap: 0.6rem; margin-top: 0.8rem; }
.pe-ann { background: var(--bg3); border-radius: 6px; padding: 0.55rem 0.85rem; font-size: 0.93rem; opacity: 0; transition: opacity 0.5s; }
.pe-ann.on { opacity: 1; }
.pe-ann-sym { color: var(--gold); font-family: 'JetBrains Mono', monospace; font-weight: bold; }
.pe-ann-desc { color: var(--muted); }
/* Softmax */
.pe-sm-demo { display: flex; gap: 1rem; align-items: flex-start; flex-wrap: wrap; }
.pe-sm-col { flex: 1; min-width: 160px; }
.pe-col-lbl { font-size: 0.82rem; letter-spacing: 0.1em; text-transform: uppercase; color: var(--muted); margin-bottom: 0.75rem; }
.pe-logit-row { display: flex; align-items: center; gap: 0.6rem; margin-bottom: 0.55rem; font-size: 0.97rem; font-family: 'JetBrains Mono', monospace; }
.pe-logit-w { width: 60px; color: var(--text); }
.pe-logit-v { padding: 0.22rem 0.6rem; border-radius: 4px; font-size: 0.92rem; }
.pe-logit-v.neg { background: rgba(248,113,113,0.15); color: var(--danger); }
.pe-logit-v.pos { background: rgba(0,229,255,0.1); color: var(--cyan); }
.pe-sm-bar { height: 22px; border-radius: 4px; background: var(--purple); transition: width 0.75s cubic-bezier(0.4,0,0.2,1); width: 0; display: flex; align-items: center; padding-left: 7px; font-size: 0.86rem; color: #fff; overflow: hidden; white-space: nowrap; }
.pe-arrow { display: flex; align-items: center; justify-content: center; padding-top: 1.4rem; font-size: 1.6rem; color: var(--cyan); }
/* Loss */
.pe-loss-wrap { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; align-items: start; }
@media(max-width:500px) { .pe-loss-wrap { grid-template-columns: 1fr; } }
.pe-loss-num { font-size: 2.8rem; font-weight: 800; font-family: 'JetBrains Mono', monospace; color: var(--danger); transition: color 0.5s; line-height: 1; }
.pe-loss-num.good { color: #4ade80; }
.pe-loss-lbl { font-size: 0.88rem; color: var(--muted); margin-top: 0.3rem; }
.pe-loss-slider label { font-size: 0.92rem; color: var(--muted); display: block; margin: 0.8rem 0 0.3rem; }
input[type=range] { width: 100%; accent-color: var(--cyan); }
.pe-loss-formula { background: var(--bg3); border-radius: 8px; padding: 1.1rem; font-family: 'JetBrains Mono', monospace; font-size: 1rem; color: var(--text); line-height: 2.1; }
.pe-lf-hl { color: var(--gold); }
.pe-lf-res { color: var(--cyan); font-weight: bold; }
/* Attention */
.pe-attn-words { display: flex; gap: 0.5rem; flex-wrap: wrap; margin-bottom: 0.9rem; }
.pe-attn-word { padding: 0.4rem 0.8rem; border-radius: 6px; background: var(--bg3); border: 1px solid var(--border); font-size: 1rem; cursor: pointer; transition: all 0.2s; user-select: none; }
.pe-attn-word:hover { border-color: var(--cyan); }
.pe-attn-word.sel { background: rgba(0,229,255,0.12); border-color: var(--cyan); color: var(--cyan); }
/* Attention formula cards */
.pe-qkv { display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 0.6rem; margin-top: 0.8rem; }
@media(max-width:480px) { .pe-qkv { grid-template-columns: 1fr; } }
.pe-qkv-card { border-radius: 6px; padding: 0.75rem 0.9rem; font-size: 0.93rem; }
/* Temperature */
.pe-temp-row-ctrl { display: flex; align-items: center; gap: 1rem; }
.pe-temp-big { font-size: 1.6rem; font-family: 'JetBrains Mono', monospace; font-weight: 700; color: var(--cyan); width: 56px; flex-shrink: 0; }
.pe-temp-lbl { font-size: 0.92rem; color: var(--muted); margin-top: 0.2rem; }
.pe-tbars { display: flex; flex-direction: column; gap: 0.45rem; margin-top: 0.9rem; }
.pe-trow { display: flex; align-items: center; gap: 0.65rem; font-size: 0.95rem; font-family: 'JetBrains Mono', monospace; }
.pe-tlbl { width: 64px; color: var(--muted); text-align: right; flex-shrink: 0; }
.pe-ttrack { flex: 1; height: 19px; background: var(--bg3); border-radius: 4px; overflow: hidden; }
.pe-tfill { height: 100%; border-radius: 4px; background: var(--purple); transition: width 0.5s cubic-bezier(0.4,0,0.2,1); }
.pe-tpct { width: 48px; text-align: right; color: var(--muted); }
/* Limits */
.pe-limits { display: grid; grid-template-columns: repeat(3, 1fr); gap: 1rem; margin: 1.5rem 0; }
@media(max-width:640px) { .pe-limits { grid-template-columns: 1fr; } }
.pe-limit-card { background: var(--bg2); border: 1px solid var(--border); border-radius: 12px; padding: 1.5rem 1.6rem; transition: border-color 0.2s; display: flex; flex-direction: column; gap: 0.4rem; }
.pe-limit-card:hover { border-color: var(--purple); }
.pe-limit-icon { font-size: 2.2rem; line-height: 1; }
.pe-limit-title { font-weight: 700; color: var(--text); font-size: 1.35rem; margin: 0; }
.pe-limit-desc { font-size: 1.15rem; color: var(--muted); line-height: 1.65; margin: 0; }
/* Dual meters */
.pe-meters { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin: 1.25rem 0; }
.pe-meter { border-radius: 8px; padding: 1.1rem; text-align: center; }
.pe-meter-lbl { font-size: 0.82rem; letter-spacing: 0.1em; text-transform: uppercase; margin-bottom: 0.4rem; }
.pe-meter-val { font-size: 2.2rem; font-weight: 800; }
/* Buttons */
.pe-btn {
margin-top: 0.9rem;
background: var(--bg3);
border: 1px solid var(--cyan);
color: var(--cyan);
padding: 0.4rem 1.1rem;
border-radius: 6px;
cursor: pointer;
font-size: 0.82rem;
font-family: 'Space Grotesk', sans-serif;
transition: background 0.2s;
}
.pe-btn:hover { background: rgba(0,229,255,0.1); }
.pe-btn-pur { border-color: var(--purple); color: var(--purple); }
.pe-btn-pur:hover { background: rgba(168,85,247,0.1); }
/* Interactive badge */
.pe-interactive-header {
display: flex;
align-items: center;
justify-content: space-between;
margin-bottom: 1rem;
}
.pe-interactive-header .pe-box-title { margin-bottom: 0; }
.pe-interactive-badge {
display: inline-flex;
align-items: center;
gap: 0.4rem;
background: rgba(0,229,255,0.08);
border: 1px solid var(--cyan);
color: var(--cyan);
font-size: 0.72rem;
font-weight: 700;
letter-spacing: 0.12em;
text-transform: uppercase;
padding: 0.3rem 0.75rem;
border-radius: 999px;
flex-shrink: 0;
}
.pe-interactive-badge::before {
content: '';
width: 7px;
height: 7px;
border-radius: 50%;
background: var(--cyan);
animation: pe-pulse 1.6s ease-in-out infinite;
flex-shrink: 0;
}
@keyframes pe-pulse {
0%, 100% { opacity: 1; transform: scale(1); }
50% { opacity: 0.4; transform: scale(0.7); }
}
.pe-interact-hint {
display: flex;
align-items: center;
gap: 0.6rem;
margin-top: 0.9rem;
padding: 0.85rem 1.1rem;
background: rgba(0,229,255,0.05);
border: 1px dashed rgba(0,229,255,0.25);
border-radius: 8px;
font-size: 1.05rem;
color: var(--muted);
}
.pe-interact-hint span { font-size: 1.25rem; }
.pe-divider { border: none; border-top: 1px solid var(--border); margin: 2.5rem 0; }
@media(max-width:520px) {
.pe-anns { grid-template-columns: 1fr; }
.pe-sm-demo { flex-direction: column; }
.pe-arrow { transform: rotate(90deg); }
}</style><div class="pe-article"><div class="pe-video"><div class="pe-video-header">▶ Full Video Explainer — covering how LLMs work, from next-token prediction to attention, training, and why hallucinations are inevitable</div><video controls= poster="/images/llms-are-pe/hero.jpg"><source src="/images/llms-are-pe/explainer.mp4" type="video/mp4"/>
Your browser doesn't support HTML5 video.</video></div><nav class="pe-toc"><h3>In this article</h3><ol><li><a href="#pe-s1">What ChatGPT and Claude actually are</a></li><li><a href="#pe-s2">The one job every LLM does</a></li><li><a href="#pe-s3">The probability formula (interactive)</a></li><li><a href="#pe-s4">Softmax: turning scores into probabilities</a></li><li><a href="#pe-s5">How it learns: cross-entropy loss</a></li><li><a href="#pe-s6">The Transformer architecture</a></li><li><a href="#pe-s7">Self-attention: every word watches every word</a></li><li><a href="#pe-s8">How text is actually generated</a></li><li><a href="#pe-s9">Temperature: controlling randomness</a></li><li><a href="#pe-s10">Why it sometimes lies (hallucinations)</a></li><li><a href="#pe-s11">Key limitations</a></li><li><a href="#pe-s12">What's next</a></li></ol></nav><p>You've used ChatGPT. You've heard the word "AI" a thousand times this year. But here's something almost nobody explains clearly: the thing powering these tools is<strong>not intelligent in any human sense</strong>. It doesn't think. It doesn't understand. It doesn't have goals.</p><p>It is, at its core, a<span class="pe-em">very sophisticated next-word predictor</span> — a probability engine trained on the vast majority of text the internet has ever produced. Once you understand this, everything else — its strengths, its failures, its weirdness — clicks into place.</p><div class="pe-callout cy"><strong>Interactive animations ahead:</strong> Press buttons and move sliders as you go — seeing the math move makes it stick.</div><h2 id="pe-s1"><span class="pe-sec-num">01 —</span>What ChatGPT and Claude actually are</h2><p>The term "Artificial Intelligence" conjures images of something that thinks, reasons, and understands — a mind in a machine. That framing is compelling, but misleading when applied to today's large language models (LLMs).</p><p>What you're actually talking to is an<strong>autoregressive probabilistic model</strong>. Every word it generates is the result of asking one question, over and over again:</p><div class="pe-callout gd"><strong>"Given everything written so far, what word is most likely to come next?"</strong></div><p>That's it. Do that billions of times on internet-scale text, and you get something that looks uncannily like reasoning. But it is, fundamentally, pattern matching at extraordinary scale — not understanding, consciousness, or genuine intelligence.</p><table class="pe-table"><thead><tr><th>What you see</th><th>What's actually happening</th><th>The catch</th></tr></thead><tbody><tr><td>It "reasons"</td><td>Pattern-matches reasoning traces from training data</td><td>Breaks on genuinely novel problems</td></tr><tr><td>It "knows facts"</td><td>Recalls high-frequency statistical associations</td><td>Hallucinates on rare edge cases</td></tr><tr><td>It's "creative"</td><td>Samples from learned creative pattern spaces</td><td>Derivative — remixes, doesn't invent</td></tr><tr><td>It has "opinions"</td><td>Outputs tokens shaped by training + alignment</td><td>No actual beliefs internally</td></tr></tbody></table><h2 id="pe-s2"><span class="pe-sec-num">02 —</span>The one job every LLM does</h2><p>Let's make this concrete. Below is a live simulation of next-token prediction. Press<strong>"Predict next token"</strong> and watch the model pick the next word based on probability scores.</p><div class="pe-box"><div class="pe-interactive-header"><div class="pe-box-title">🎯 Next-Token Prediction</div><span class="pe-interactive-badge">Live · Interactive</span></div><div class="pe-sentence" id="pe-sentence">The cat sat on the<span class="pe-cursor"/></div><div class="pe-prob-bars" id="pe-prob-bars"/><div class="pe-interact-hint"><span>👇</span> Press the button to watch the model predict — one token at a time.</div><button class="pe-btn" onclick="pePredict()">Predict next token →</button></div><p>Notice the bars: each candidate word gets a probability score. The model doesn't "decide" in any human sense — it samples from this distribution. The highest-probability word is chosen most often, but not always. That's where both creativity and errors come from.</p><h2 id="pe-s3"><span class="pe-sec-num">03 —</span>The probability formula</h2><p>Here's the mathematical heart of it.<strong>Hover each term</strong> for a plain-English tooltip, then press the button to reveal the full breakdown piece by piece.</p><div class="pe-box"><div class="pe-interactive-header"><div class="pe-box-title">📐 Probability Formula</div><span class="pe-interactive-badge">Live · Interactive</span></div><div class="pe-interact-hint" style="margin-top:0;margin-bottom:0.9rem;"><span>🖱️</span> Hover any term for a plain-English tooltip. Press the button to reveal the formula step by step.</div><div class="pe-math"><span class="pe-term" id="pet0" data-tip="P = Probability of">P</span><span class="pe-term" id="pet1" data-tip="wₜ = the specific token we're predicting">(w<sub>t</sub></span><span class="pe-term" id="pet2" data-tip="| = 'given all of this before it'"> |</span><span class="pe-term" id="pet3" data-tip="w<t = every token that came before in the context"> w<sub>&lt;t</sub></span><span class="pe-term" id="pet4" data-tip="; θ = the model's billions of learned parameters"> ; θ)</span><span class="pe-term" id="pet5" data-tip="= the output we calculate"> =</span><span class="pe-term" id="pet6" data-tip="softmax converts raw scores into a proper probability distribution summing to 1"> softmax(logits<sub>t</sub>)</span></div><div class="pe-anns" id="pe-anns"><div class="pe-ann" id="pea0"><span class="pe-ann-sym">wₜ</span> —<span class="pe-ann-desc">The next token to predict</span></div><div class="pe-ann" id="pea1"><span class="pe-ann-sym">w&lt;t</span> —<span class="pe-ann-desc">All previous tokens (the context)</span></div><div class="pe-ann" id="pea2"><span class="pe-ann-sym">θ</span> —<span class="pe-ann-desc">Billions of learned parameters</span></div><div class="pe-ann" id="pea3"><span class="pe-ann-sym">softmax</span> —<span class="pe-ann-desc">Converts scores → probabilities (sum = 1)</span></div></div><button class="pe-btn" onclick="peRevealFormula()">Reveal formula step by step →</button></div><p>Plain English:<span class="pe-em">"Given everything typed so far, and everything the model learned during training, what is the probability of each possible next word?"</span> The model scores every word in its vocabulary — 50,000+ words — and softmax turns those raw scores into probabilities that add up to exactly 1.0.</p><h2 id="pe-s4"><span class="pe-sec-num">04 —</span>Softmax: raw scores → probabilities</h2><p>The model internally produces a raw score (called a<strong>logit</strong>) for every possible next word. Logits can be any number — positive, negative, large, small. They're not probabilities yet. The<strong>softmax</strong> function converts them into a clean distribution. Press the button to watch the transformation.</p><div class="pe-box"><div class="pe-interactive-header"><div class="pe-box-title">⚡ Softmax Transform</div><span class="pe-interactive-badge">Live · Interactive</span></div><div class="pe-interact-hint" style="margin-top:0;margin-bottom:0.9rem;"><span>👇</span> Press the button to watch raw scores transform into probabilities.</div><div class="pe-sm-demo"><div class="pe-sm-col"><div class="pe-col-lbl">Raw Logits (scores)</div><div id="pe-logits"/></div><div class="pe-arrow" id="pe-sm-arrow" style="opacity:0.3">→</div><div class="pe-sm-col"><div class="pe-col-lbl">After Softmax (probabilities)</div><div id="pe-softmax"/></div></div><button class="pe-btn pe-btn-pur" onclick="peSoftmax()">Run softmax →</button></div><p>Notice: even the most negative logit still gets a small non-zero probability after softmax. The model never completely rules anything out — it just makes some words astronomically unlikely. This is partly why LLMs occasionally produce bizarre outputs: a 0.001% token still gets picked sometimes.</p><h2 id="pe-s5"><span class="pe-sec-num">05 —</span>How it learns: cross-entropy loss</h2><p>During training, the model sees a sentence with the last word hidden and makes a prediction. The training algorithm asks:<span class="pe-em">"How wrong were you?"</span> The measure of wrongness is<strong>cross-entropy loss</strong>.</p><p>The formula:<code style="color:var(--gold);background:var(--bg3);padding:2px 8px;border-radius:4px;font-family:'JetBrains Mono',monospace;">ℒ = −log P(correct word)</code>. If the model assigns 100% probability to the right word, loss = 0. If it assigns 1%, loss is very high.<strong>Drag the slider</strong> to see this in action.</p><div class="pe-box"><div class="pe-interactive-header"><div class="pe-box-title">📉 Cross-Entropy Loss</div><span class="pe-interactive-badge">Live · Interactive</span></div><div class="pe-interact-hint" style="margin-top:0;margin-bottom:0.9rem;"><span>🎚️</span> Drag the slider to change the model's confidence and watch the loss recalculate live.</div><div class="pe-loss-wrap"><div><div class="pe-loss-num" id="pe-loss-num">1.47</div><div class="pe-loss-lbl">Loss ℒ = −log(p)</div><div class="pe-loss-slider"><label>Model's confidence in correct word:<strong id="pe-conf-lbl">23%</strong></label><input type="range" id="pe-conf-slider" min="1" max="99" value="23" oninput="peLoss(this.value)"/></div></div><div class="pe-loss-formula">
Correct word:<span class="pe-lf-hl">"lazy"</span><br>
P("lazy"):<span class="pe-lf-hl" id="pe-lf-p">0.23</span><br><br>
ℒ = −log(<span class="pe-lf-hl" id="pe-lf-p2">0.23</span>)<br>
ℒ =<span class="pe-lf-res" id="pe-lf-res">1.47</span><br><br><span style="color:var(--muted);font-size:0.76rem;" id="pe-lf-verdict">High loss → big update</span></div></div></div><h2 id="pe-s6"><span class="pe-sec-num">06 —</span>The Transformer: the machine inside</h2><p>The specific architecture that makes modern LLMs work is called the<strong>Transformer</strong>, introduced in a landmark 2017 Google paper. Every major LLM today — GPT-4, Claude, Gemini, Llama — is built on this design.</p><p>A Transformer processes your text through many stacked layers. Each layer has two main components:</p><div class="pe-callout"><strong>Multi-Head Self-Attention</strong> — Every word simultaneously looks at every other word, learning which relationships matter. This is the core insight.<br><br><strong>Feed-Forward Network</strong> — A dense neural network that processes each token's information independently, after attention has been applied.</div><p>A large model like GPT-4 stacks around 96 of these layers. With enough layers, parameters, and training data, emergent abilities appear — code generation, translation, basic reasoning — that nobody explicitly programmed. They fall out of the math at scale.</p><h2 id="pe-s7"><span class="pe-sec-num">07 —</span>Self-attention: every word watches every word</h2><p>Before the Transformer, AI models processed text word by word in sequence, making it hard to connect things far apart in a sentence. Self-attention solves this by letting every word simultaneously evaluate its relationship to every other word.<strong>Click a word</strong> to see its attention weights.</p><div class="pe-box"><div class="pe-interactive-header"><div class="pe-box-title">🔍 Self-Attention Weights</div><span class="pe-interactive-badge">Live · Interactive</span></div><div class="pe-attn-words" id="pe-attn-words"/><div id="pe-attn-grid"/><div class="pe-interact-hint"><span>👆</span> Click any word above to see how it attends to every other word. Brighter = stronger attention — notice "it" lights up "animal."</div></div><div class="pe-box"><div class="pe-box-title">📐 The Attention Equation</div><div class="pe-math" style="font-size:1.05rem;">
Attention(Q, K, V) = softmax(<span style="color:var(--cyan);">QKᵀ</span> /<span style="color:var(--gold);">√d<sub>k</sub></span> ) ·<span style="color:var(--purple);">V</span></div><div class="pe-qkv"><div class="pe-qkv-card" style="background:rgba(0,229,255,0.07);border:1px solid rgba(0,229,255,0.2);"><div style="color:var(--cyan);font-weight:700;margin-bottom:0.3rem;font-size:1rem;">Q — Query</div><div style="color:var(--muted);font-size:0.92rem;">"What am I looking for?"</div></div><div class="pe-qkv-card" style="background:rgba(251,191,36,0.07);border:1px solid rgba(251,191,36,0.2);"><div style="color:var(--gold);font-weight:700;margin-bottom:0.3rem;font-size:1rem;">K — Key</div><div style="color:var(--muted);font-size:0.92rem;">"What does each word offer?"</div></div><div class="pe-qkv-card" style="background:rgba(168,85,247,0.07);border:1px solid rgba(168,85,247,0.2);"><div style="color:var(--purple);font-weight:700;margin-bottom:0.3rem;font-size:1rem;">V — Value</div><div style="color:var(--muted);font-size:0.92rem;">"What info do I retrieve?"</div></div></div></div><h2 id="pe-s8"><span class="pe-sec-num">08 —</span>How text is actually generated</h2><p>When you press Send in any AI chat app, here is exactly what happens:</p><ol style="padding-left:1.3rem;margin-bottom:1.1rem;"><li style="margin-bottom:0.55rem;color:var(--muted);"><strong style="color:var(--text);">Tokenization</strong> — Your message splits into tokens (subwords). "unbelievable" → ["un","believ","able"].</li><li style="margin-bottom:0.55rem;color:var(--muted);"><strong style="color:var(--text);">Embedding</strong> — Each token becomes a high-dimensional vector capturing meaning and position.</li><li style="margin-bottom:0.55rem;color:var(--muted);"><strong style="color:var(--text);">Forward pass</strong> — Vectors flow through all Transformer layers. Attention and feed-forward happen, repeatedly.</li><li style="margin-bottom:0.55rem;color:var(--muted);"><strong style="color:var(--text);">Logits → Probabilities</strong> — The final layer scores every vocabulary word. Softmax converts to probabilities.</li><li style="margin-bottom:0.55rem;color:var(--muted);"><strong style="color:var(--text);">Sampling</strong> — One word is chosen based on those probabilities.</li><li style="margin-bottom:0.55rem;color:var(--muted);"><strong style="color:var(--text);">Repeat</strong> — That word is appended and the whole process runs again until the response is done.</li></ol><div class="pe-callout"><strong>KV Caching:</strong> The model caches Key and Value matrices from previous steps so it doesn't recompute attention from scratch every token — making long responses computationally feasible.</div><h2 id="pe-s9"><span class="pe-sec-num">09 —</span>Temperature: controlling randomness</h2><p>When sampling the next word, you can control how random the selection is with a parameter called<strong>temperature</strong>. Drag the slider to see how it reshapes the probability distribution in real time.</p><div class="pe-box"><div class="pe-interactive-header"><div class="pe-box-title">🌡️ Temperature Sampling</div><span class="pe-interactive-badge">Live · Interactive</span></div><div class="pe-interact-hint" style="margin-top:0;margin-bottom:0.9rem;"><span>🎚️</span> Drag the slider left for predictable outputs, right for creative (or chaotic) ones.</div><div class="pe-temp-row-ctrl"><div><div class="pe-temp-big" id="pe-temp-val">1.0</div><div class="pe-temp-lbl" id="pe-temp-lbl">Balanced</div></div><input type="range" id="pe-temp-slider" min="1" max="30" value="10" style="flex:1;accent-color:var(--cyan);" oninput="peTemp(this.value)"/></div><div class="pe-tbars" id="pe-tbars"/><div style="font-size:0.92rem;color:var(--muted);margin-top:0.7rem;">Formula: p'ᵢ = pᵢ<sup>1/T</sup> / Σ(pⱼ<sup>1/T</sup>)</div></div><p>Low temperature (e.g. 0.2) makes the model deterministic — it almost always picks the top word. High temperature (e.g. 2.0) flattens the distribution, giving unusual words a real chance. Most production systems run between 0.7 and 1.0.</p><h2 id="pe-s10"><span class="pe-sec-num">10 —</span>Why it sometimes lies (hallucinations)</h2><p>One of the most misunderstood LLM behaviors is<strong>hallucination</strong> — when the model confidently states something false. This isn't a bug to be patched away. It's a direct consequence of the architecture.</p><p>The model has no internal truth checker. No access to the real world. It only knows:<span class="pe-em">what sequence of words tends to follow this sequence of words?</span> When asked something rare or obscure, the model fills the gap with statistically plausible text — which may be completely wrong.</p><div class="pe-callout gd"><strong>Analogy:</strong> Imagine someone who has read every book in a library but never left the building. Ask what the weather is like outside — they'll give a confident, well-reasoned answer based on weather descriptions they've read. It might be completely wrong.</div><div class="pe-meters"><div class="pe-meter" style="background:rgba(248,113,113,0.08);border:1px solid var(--danger);"><div class="pe-meter-lbl" style="color:var(--danger);">Ground Truth Access</div><div class="pe-meter-val" style="color:var(--danger);">NONE</div></div><div class="pe-meter" style="background:rgba(74,222,128,0.08);border:1px solid #4ade80;"><div class="pe-meter-lbl" style="color:#4ade80;">Statistical Plausibility</div><div class="pe-meter-val" style="color:#4ade80;">HIGH</div></div></div><h2 id="pe-s11"><span class="pe-sec-num">11 —</span>Key limitations to know</h2><p>Understanding these isn't pessimism — it's how you use these tools well.</p><div class="pe-limits"><div class="pe-limit-card"><div class="pe-limit-icon">📏</div><div class="pe-limit-title">Context Window</div><div class="pe-limit-desc">Fixed memory. Older models: ~4K tokens. Newer: up to 1M+. Anything beyond the window is completely invisible to the model.</div></div><div class="pe-limit-card"><div class="pe-limit-icon">🌀</div><div class="pe-limit-title">No Persistent Memory</div><div class="pe-limit-desc">Every conversation starts completely fresh. The model has no memory of past sessions unless you explicitly provide them.</div></div><div class="pe-limit-card"><div class="pe-limit-icon">🎲</div><div class="pe-limit-title">Stochasticity</div><div class="pe-limit-desc">Same prompt, potentially different outputs. The sampling process is inherently random, even at low temperatures.</div></div><div class="pe-limit-card"><div class="pe-limit-icon">🔓</div><div class="pe-limit-title">Jailbreaks</div><div class="pe-limit-desc">Safety training is pattern-based. Clever prompting can sometimes bypass it because the model is still a pattern matcher at heart.</div></div><div class="pe-limit-card"><div class="pe-limit-icon">💭</div><div class="pe-limit-title">Hallucinations</div><div class="pe-limit-desc">Inevitable on low-frequency knowledge. No fact-checker means confident errors are always possible. Verify important claims.</div></div><div class="pe-limit-card"><div class="pe-limit-icon">⚡</div><div class="pe-limit-title">Quadratic Cost</div><div class="pe-limit-desc">Attention cost grows quadratically with context length. Techniques like FlashAttention mitigate this, but it's a fundamental constraint.</div></div></div><h2 id="pe-s12"><span class="pe-sec-num">12 —</span>What's next</h2><p>The probability-engine core remains — but researchers are building powerful layers on top.<strong>RAG (Retrieval-Augmented Generation)</strong> gives the model access to real documents at query time, dramatically reducing hallucination on factual tasks.<strong>Agentic systems</strong> let LLMs use tools, execute code, and iterate on their outputs.<strong>Reasoning models</strong> generate long internal chains of thought before answering, improving performance on math and logic. And<strong>multimodal models</strong> extend the same probabilistic core to images and audio.</p><p>None of these change the fundamental nature of what an LLM is. They all sit on top of the same next-token prediction engine. Understanding that foundation is what makes you a sharper thinker about where this technology is — and isn't — going.</p><div class="pe-callout cy"><strong>The bottom line:</strong> LLMs are extraordinary pattern-recognition engines that have scaled statistical prediction to the point of producing genuinely useful, sometimes astonishing outputs. They are not intelligent in any human sense. Knowing this — really knowing it — is what separates clear thinking about AI from hype.</div><hr class="pe-divider"/><p style="color:var(--muted);font-size:0.95rem;">Video generated with Grok Imagine. Animations built with vanilla JavaScript.</p></div><script>
(function() {
// ── Token Prediction ──
const peSeqs = [
{ prefix: "The cat sat on the", cands: [
{w:"mat",p:62,win:true},{w:"floor",p:18},{w:"rug",p:11},{w:"roof",p:5},{w:"couch",p:4}
]},
{ prefix: "The cat sat on the mat and", cands: [
{w:"looked",p:41,win:true},{w:"waited",p:28},{w:"purred",p:17},{w:"slept",p:9},{w:"yawned",p:5}
]},
{ prefix: "The cat sat on the mat and looked", cands: [
{w:"up",p:54,win:true},{w:"around",p:22},{w:"out",p:14},{w:"away",p:7},{w:"back",p:3}
]},
{ prefix: "The cat sat on the mat and looked up at", cands: [
{w:"the",p:58,win:true},{w:"me",p:22},{w:"nothing",p:11},{w:"her",p:6},{w:"him",p:3}
]},
];
var peIdx = 0;
function peRenderBars(cands) {
var c = document.getElementById('pe-prob-bars');
if (!c) return;
c.innerHTML = '';
cands.forEach(function(cd, i) {
var row = document.createElement('div');
row.className = 'pe-prob-row';
row.innerHTML = '<div class="pe-prob-lbl">'+cd.w+'</div><div class="pe-prob-track"><div class="pe-prob-fill'+(cd.win?' win':'')+'" id="pepf'+i+'"/></div><div class="pe-prob-pct">'+cd.p+'%</div>';
c.appendChild(row);
});
setTimeout(function() {
cands.forEach(function(cd, i) {
var el = document.getElementById('pepf'+i);
if (el) el.style.width = cd.p+'%';
});
}, 60);
}
window.pePredict = function() {
var seq = peSeqs[peIdx % peSeqs.length];
peRenderBars(seq.cands);
setTimeout(function() {
var winner = seq.cands.find(function(c){return c.win;});
var el = document.getElementById('pe-sentence');
if (el) el.innerHTML = seq.prefix+'<span style="color:var(--gold);font-weight:bold;">'+winner.w+'</span><span class="pe-cursor"/>';
}, 800);
peIdx++;
};
peRenderBars(peSeqs[0].cands);
// ── Formula Reveal ──
var peTerms = ['pet0','pet1','pet2','pet3','pet4','pet5','pet6'];
var peAnns = ['pea0','pea1','pea2','pea3'];
window.peRevealFormula = function() {
peTerms.forEach(function(id){var el=document.getElementById(id);if(el)el.classList.remove('on');});
peAnns.forEach(function(id){var el=document.getElementById(id);if(el)el.classList.remove('on');});
peTerms.forEach(function(id, i){ setTimeout(function(){ var el=document.getElementById(id);if(el)el.classList.add('on'); }, i*280); });
peAnns.forEach(function(id, i){ setTimeout(function(){ var el=document.getElementById(id);if(el)el.classList.add('on'); }, 2100+i*230); });
};
// ── Softmax ──
var peSMData = [
{w:'mat',l:4.2},{w:'floor',l:1.8},{w:'rug',l:0.9},{w:'table',l:-0.4},{w:'sky',l:-2.1}
];
(function initSM() {
var li = document.getElementById('pe-logits');
var si = document.getElementById('pe-softmax');
if (!li||!si) return;
li.innerHTML = '';
si.innerHTML = '';
peSMData.forEach(function(d,i) {
li.innerHTML += '<div class="pe-logit-row"><span class="pe-logit-w">'+d.w+'</span><span class="pe-logit-v '+(d.l<0?'neg':'pos')+'">'+(d.l>0?'+':'')+d.l+'</span></div>';
si.innerHTML += '<div class="pe-logit-row"><span class="pe-logit-w">'+d.w+'</span><div class="pe-sm-bar" id="pesm'+i+'"/></div>';
});
})();
window.peSoftmax = function() {
var exps = peSMData.map(function(d){return Math.exp(d.l);});
var sum = exps.reduce(function(a,b){return a+b;},0);
var probs = exps.map(function(e){return e/sum;});
var arrow = document.getElementById('pe-sm-arrow');
if (arrow) arrow.style.opacity = '1';
probs.forEach(function(p,i){
setTimeout(function(){
var bar = document.getElementById('pesm'+i);
if (!bar) return;
bar.style.width = Math.max(p*150,0)+'px';
bar.textContent = (p*100).toFixed(1)+'%';
}, i*140);
});
};
// ── Loss ──
window.peLoss = function(val) {
var p = val/100;
var loss = -Math.log(p);
var cl = document.getElementById('pe-conf-lbl');
var ln = document.getElementById('pe-loss-num');
var lp = document.getElementById('pe-lf-p');
var lp2 = document.getElementById('pe-lf-p2');
var lr = document.getElementById('pe-lf-res');
var lv = document.getElementById('pe-lf-verdict');
if(cl) cl.textContent = val+'%';
if(ln){ ln.textContent = loss.toFixed(2); ln.classList.toggle('good', loss< 0.5);= }= if(lp)= lp.textContent=p.toFixed(2); if(lp2)= lp2.textContent=p.toFixed(2); if(lr)= lr.textContent=loss.toFixed(2); if(lv)= lv.textContent=loss <= 0.5= ?= 'Low= loss= →= small= parameter= update= ✓'= := 'High= loss= →= big= parameter= update= ↑';= };= //= ──= Attention= ──= var= peAttnWords=["The","animal","didn't","cross","the","street","because","it","was","tired"]; var= peAttnW={ "The":= [0.5,0.1,0.05,0.05,0.1,0.05,0.05,0.05,0.03,0.02],= "animal":= [0.05,0.55,0.05,0.05,0.05,0.05,0.05,0.1,0.03,0.02],= "didn't":= [0.04,0.08,0.5,0.08,0.04,0.08,0.06,0.05,0.05,0.02],= "cross":= [0.03,0.05,0.08,0.5,0.03,0.12,0.08,0.05,0.04,0.02],= "the":= [0.08,0.05,0.04,0.05,0.5,0.12,0.05,0.04,0.05,0.02],= "street":= [0.04,0.05,0.06,0.12,0.1,0.45,0.06,0.05,0.05,0.02],= "because":= [0.03,0.06,0.08,0.08,0.03,0.07,0.45,0.1,0.07,0.03],= "it":= [0.03,0.38,0.07,0.06,0.03,0.06,0.1,0.15,0.09,0.03],= "was":= [0.03,0.06,0.05,0.05,0.03,0.05,0.07,0.1,0.5,0.06],= "tired":= [0.03,0.08,0.05,0.05,0.03,0.05,0.07,0.12,0.1,0.42],= };= var= peSel="it" ;= function= peRenderAttn()= {= var= wc=document.getElementById('pe-attn-words'); if= (!wc)= return;= wc.innerHTML='' ;= peAttnWords.forEach(function(w)= {= var= el=document.createElement('div'); el.className='pe-attn-word' +(w===peSel?' sel':'');= el.textContent=w; el.onclick=function(){ peSel=w; peRenderAttn();= };= wc.appendChild(el);= });= var= weights=peAttnW[peSel] ||= peAttnW["it"];= var= wrap=document.getElementById('pe-attn-grid'); if= (!wrap)= return;= wrap.innerHTML='' ;= var= n=peAttnWords.length; var= grid=document.createElement('div'); grid.style.display='grid' ;= grid.style.gridTemplateColumns='repeat(' +n+',= 1fr)';= grid.style.gap='3px' ;= weights.forEach(function(wt,= j)= {= var= cell=document.createElement('div'); cell.style.height='26px' ;= cell.style.borderRadius='3px' ;= cell.style.background='rgba(0,229,255,' +wt+')';= cell.style.transition='background 0.4s' ;= cell.title=peSel+' →= '+peAttnWords[j]+':= '+(wt*100).toFixed(0)+'%';= grid.appendChild(cell);= });= var= labelRow=document.createElement('div'); labelRow.style.display='grid' ;= labelRow.style.gridTemplateColumns='repeat(' +n+',= 1fr)';= labelRow.style.gap='3px' ;= labelRow.style.marginTop='4px' ;= peAttnWords.forEach(function(w)= {= var= lbl=document.createElement('div'); lbl.textContent=w; lbl.style.fontSize='0.6rem' ;= lbl.style.color='var(--muted)' ;= lbl.style.textAlign='center' ;= lbl.style.overflow='hidden' ;= lbl.style.textOverflow='ellipsis' ;= labelRow.appendChild(lbl);= });= wrap.appendChild(grid);= wrap.appendChild(labelRow);= }= peRenderAttn();= //= ──= Temperature= ──= var= peTempBase=[ {w:'mat',p:0.52},{w:'floor',p:0.22},{w:'rug',p:0.13},{w:'table',p:0.08},{w:'sky',p:0.03},{w:'cloud',p:0.02}= ];= (function= initTemp()= {= var= c=document.getElementById('pe-tbars'); if= (!c)= return;= c.innerHTML='' ;= peTempBase.forEach(function(d,i)= {= c.innerHTML= +='<div class="pe-trow"><span class="pe-tlbl">' +d.w+'</span=><div class="pe-ttrack"><div class="pe-tfill" id="petf'+i+'"/></div><span class="pe-tpct" id="petp'+i+'">—</span></div>';
});
peTemp(10);
})();
window.peTemp = function(val) {
var T = val/10;
var dv = document.getElementById('pe-temp-val');
var dl = document.getElementById('pe-temp-lbl');
if (dv) dv.textContent = T.toFixed(1);
if (dl) {
if (T<0.5) dl.textContent='🧊 Deterministic' ;= else= if(T<0.8)= dl.textContent='🔵 Conservative' ;= else= if(T<1.3)= dl.textContent='⚖️ Balanced — sweet spot' ;= else= if(T<2.0)= dl.textContent='🔥 Creative' ;= else= dl.textContent='🌋 Chaotic' ;= }= var= scaled=peTempBase.map(function(d){return Math.pow(d.p,1/T);});= var= sum=scaled.reduce(function(a,b){return a+b;},0);= var= probs=scaled.map(function(s){return s/sum;});= probs.forEach(function(p,i){= var= f=document.getElementById('petf'+i); var= t=document.getElementById('petp'+i); if(f)= f.style.width=(p*100)+'%'; if(t)= t.textContent=(p*100).toFixed(1)+'%'; });= };= })();= </script=>
]]></content:encoded><media:content url="https://curiousbit.netlify.app/images/llms-are-pe/hero.jpg" medium="image"><media:title type="plain">Machine-Learning</media:title></media:content><category>artificial-intelligence</category><category>llm</category><category>machine-learning</category><category>deep-learning</category><category>architecture</category><category>Knowledge Base</category></item><item><title>Knowledge Distillation: From Massive Models to Efficient Intelligence</title><link>https://curiousbit.netlify.app/knowledge-distillation-from-massive-models-to-efficient-intelligence/</link><guid isPermaLink="true">https://curiousbit.netlify.app/knowledge-distillation-from-massive-models-to-efficient-intelligence/</guid><pubDate>Fri, 08 May 2026 00:00:00 +0000</pubDate><dc:creator>Ajay Walia</dc:creator><description>&lt;p&gt;There is a scene you have probably seen in countless films: a master craftsman, decades of experience locked in his hands, patiently guiding a young apprentice. The master does not hand over a textbook. He transfers something richer — intuition, nuance, an understanding of &lt;em&gt;why&lt;/em&gt; certain choices matter. The apprentice, unburdened by the master&amp;rsquo;s size and slowness, eventually moves faster and in some cases surpasses the teacher entirely.&lt;/p&gt;</description><content:encoded>&lt;![CDATA[<img src="https://curiousbit.netlify.app/images/kd-master-apprentice.jpg" alt="Machine-Learning" style="max-width:100%;height:auto;margin-bottom:1.5em;"/><p>There is a scene you have probably seen in countless films: a master craftsman, decades of experience locked in his hands, patiently guiding a young apprentice. The master does not hand over a textbook. He transfers something richer — intuition, nuance, an understanding of<em>why</em> certain choices matter. The apprentice, unburdened by the master&rsquo;s size and slowness, eventually moves faster and in some cases surpasses the teacher entirely.</p><p>Knowledge Distillation is that scene, rendered in mathematics.</p><p>Introduced formally by Geoffrey Hinton, Oriol Vinyals, and Jeff Dean at Google in 2015, Knowledge Distillation (KD) is a model compression technique where a large, expensive model — the<strong>teacher</strong> — transfers its learned intelligence to a compact, deployable model — the<strong>student</strong>. The student retains over 90% of the teacher&rsquo;s accuracy while being up to 100× smaller and faster.</p><p>This article takes you from the intuition all the way through to the advanced variants that are reshaping AI deployment in 2026.</p><hr><h2 id="the-problem-intelligence-is-expensive">The Problem: Intelligence Is Expensive</h2><p>Modern AI models are enormous. GPT-4 is estimated to contain over a trillion parameters. BERT-large has 340 million. These models achieve stunning accuracy — but they are cumbersome to deploy. Running a trillion-parameter model for every user query would require data centres the size of small cities.</p><p>The engineering instinct is to train a smaller model directly. But smaller models trained from scratch on raw data consistently underperform large ones. Why?</p><p>Because raw training data is<em>hard</em>. A cat photo labelled simply &ldquo;cat&rdquo; gives a small model very little to work with. A large model, however, does not just see &ldquo;cat&rdquo; — it sees a distribution of confidence across thousands of classes. &ldquo;Cat: 0.92, Lynx: 0.06, Tabby: 0.02.&rdquo; That probability distribution is enormously richer than the hard label.</p><p>Hinton called this richer signal<strong>dark knowledge</strong> — the information encoded in what the model<em>almost</em> predicted.</p><hr><h2 id="the-teacher-student-paradigm">The Teacher-Student Paradigm</h2><p><img src="/images/kd-master-apprentice.jpg" alt="A Renaissance master transfers glowing knowledge orbs to his apprentice"/><p>The core idea is elegant. Instead of training the student on raw labelled data, you train it to<strong>mimic the teacher&rsquo;s output distribution</strong>.</p><p>You run every training example through the large teacher model. For each example, instead of a hard label (0 or 1), you collect the teacher&rsquo;s full<strong>soft target</strong> — the probability it assigns to every possible class. You then train the student to produce those same soft probability distributions.</p><p>The student loss function becomes:</p><pre tabindex="0"><code>Loss = α × (cross-entropy with hard labels)
+ (1-α) × (KL divergence from teacher soft targets)</code></pre><p>The blending weight<code>α</code> controls how much the student learns from the raw data versus the teacher&rsquo;s guidance. In practice, a small<code>α</code> (more weight on teacher targets) is usually optimal.</p><hr><h2 id="soft-targets-and-dark-knowledge">Soft Targets and Dark Knowledge</h2><p><img src="/images/kd-dark-knowledge.jpg" alt="A Flemish alchemist distils the essence of a massive Teacher Model flask into a tiny Student Model vial"/><p>Hard labels are binary. Soft targets are continuous. That difference is enormous.</p><p>Consider an image of a dog that slightly resembles a wolf. A hard label says &ldquo;dog: 1, wolf: 0.&rdquo; A teacher that has seen millions of examples says &ldquo;dog: 0.84, wolf: 0.13, fox: 0.03.&rdquo; That residual probability on<em>wolf</em> carries genuine information about the visual ambiguity in the image. The student trained on soft targets learns not just the answer, but the<em>shape of uncertainty</em> around the answer.</p><p>This is the dark knowledge. It lives in the tails of the distribution — the non-zero probabilities on wrong answers — and it makes the student dramatically more robust than one trained on hard labels alone.</p><hr><h2 id="temperature-the-control-knob">Temperature: The Control Knob</h2><p><img src="/images/kd-temperature.jpg" alt="A Renaissance philosopher adjusts the Temperature T dial on a celestial orrery, sharpening planets on the left and softening them to probability clouds on the right"/><p>Soft targets, by default, tend to be very peaked — the teacher is often highly confident in its top prediction, assigning 0.99 to the correct class and tiny residuals to everything else. At that extreme, the soft target is barely different from a hard label, and the dark knowledge disappears.</p><p>Hinton&rsquo;s solution was<strong>temperature scaling</strong>. Before computing the softmax, you divide the logits by a temperature parameter T:</p><pre tabindex="0"><code>p_i = exp(z_i / T) / Σ exp(z_j / T)</code></pre><p>At<strong>T = 1</strong> (standard), outputs are sharp and peaked.
At<strong>T &gt; 1</strong> (high temperature), outputs become softer and more spread, revealing the relative confidence structure across all classes.</p><p>During distillation, both teacher and student use the same elevated temperature (typically T = 3–5). This &ldquo;warms up&rdquo; the teacher&rsquo;s output into a richer, more informative distribution for the student to learn from. After training, the student is deployed with T = 1.</p><p>The effect is striking. Higher temperatures expose more inter-class structure, giving the student a better map of the concept landscape rather than just a list of correct answers.</p><hr><h2 id="what-gets-transferred-three-flavours-of-distillation">What Gets Transferred? Three Flavours of Distillation</h2><p>Knowledge can flow from teacher to student in different ways. The research community has converged on three main categories:</p><p><strong>Response-based distillation</strong> — the original Hinton approach. The student matches the teacher&rsquo;s final output layer (soft targets). Simple, effective, widely used.</p><p><strong>Feature-based distillation</strong> — the student is trained to match not just the final output but intermediate representations — specific layers or attention maps inside the teacher. This transfers<em>how</em> the teacher thinks, not just what it concludes. The trade-off is complexity: the teacher and student must have compatible architectures or an adapter layer is needed.</p><p><strong>Relation-based distillation</strong> — the student learns to replicate the<em>relationships</em> between different training examples as the teacher sees them. If the teacher places cat images and dog images in nearby regions of its feature space, the student should too. This approach is particularly powerful for metric learning and few-shot tasks.</p><hr><h2 id="advanced-variants">Advanced Variants</h2><h3 id="multi-task-distillation">Multi-Task Distillation</h3><p><img src="/images/kd-polymath-student.jpg" alt="A Leonardo da Vinci polymath student simultaneously masters writing, painting, anatomy, and geometry with golden threads connecting all disciplines to a glowing brain"/><p>Microsoft&rsquo;s MT-DNN research showed that distillation composes naturally with multi-task learning. A teacher trained on nine different natural language tasks simultaneously was distilled into a single student model. The distilled MT-DNN outperformed the original on 7 of 9 GLUE benchmark tasks — pushing the single-model state of the art to 83.7%.</p><p>The insight: when a teacher has learned to generalise across many domains, its soft targets encode cross-task structure that a specialised student cannot discover on its own.</p><h3 id="the-teacher-assistant-bridge">The Teacher Assistant Bridge</h3><p>What happens when the teacher and student are so different in capacity that direct distillation fails? A very large teacher produces soft targets the tiny student simply cannot model well.</p><p>The solution is an intermediate<strong>Teacher Assistant (TA)</strong> — a medium-sized model that first distils from the large teacher, then acts as teacher to the small student. The TA bridges the capacity gap, giving the small student a more tractable target. Research has consistently shown this staged approach outperforms direct large-to-small distillation when the size gap is more than an order of magnitude.</p><h3 id="when-the-student-surpasses-the-teacher">When the Student Surpasses the Teacher</h3><p><img src="/images/kd-student-exceeds.jpg" alt="A young apprentice stands triumphant as his glowing painting outshines the master&rsquo;s faded work, with the aged teacher bowing respectfully"/><p>One of the most counter-intuitive findings in knowledge distillation is that the student can sometimes<em>exceed</em> the teacher.</p><p>The 2022<strong>Symbolic Knowledge Distillation</strong> paper demonstrated this dramatically. The researchers distilled commonsense reasoning from GPT-3 (175B parameters) into a purpose-built commonsense model at 100× smaller size. The resulting student — COMET-DISTIL — outperformed GPT-3 on commonsense benchmarks.</p><p>How? The distillation process acted as a filter. Rather than transferring all of GPT-3&rsquo;s knowledge, the researchers used a<strong>critic model</strong> to selectively distil only high-quality, high-confidence commonsense triples. The student was not burdened by GPT-3&rsquo;s off-topic knowledge or low-confidence noise. It received a curated, concentrated version of the teacher&rsquo;s relevant expertise.</p><p>This is the Renaissance apprentice story made literal: the student, given the master&rsquo;s best knowledge and freed from the master&rsquo;s constraints, eventually does better work.</p><hr><h2 id="real-world-results">Real-World Results</h2><p>The numbers behind knowledge distillation are worth anchoring:</p><p>In Hinton&rsquo;s original speech recognition experiments on a heavily used commercial system, a distilled single model<strong>matched the accuracy of a 10-model ensemble</strong> while requiring one-tenth the compute at inference time.</p><p>In the speech recognition benchmark specifically:</p><table><thead><tr><th>Model</th><th>Frame Accuracy</th><th>Word Error Rate</th></tr></thead><tbody><tr><td>Baseline (single model)</td><td>50.9%</td><td>10.9%</td></tr><tr><td>10× model ensemble (teacher)</td><td>61.1%</td><td>10.7%</td></tr><tr><td>Distilled single student</td><td><strong>60.8%</strong></td><td><strong>10.7%</strong></td></tr></tbody></table><p>The student matches the ensemble at a fraction of the cost. This is the central promise of KD — and it has held up across vision, language, and speech for over a decade.</p><hr><h2 id="why-this-matters-in-2026">Why This Matters in 2026</h2><p>Knowledge distillation is no longer a research technique. It is infrastructure.</p><p>Every major on-device AI model — the language models on your phone, the vision models in your camera, the wake-word detectors in your earbuds — was almost certainly distilled from a much larger cloud model. DistilBERT, MobileNet, and Whisper Tiny are all products of distillation.</p><p>The technique is also central to the LLM compression wave of the past two years. Models like Phi-3, Mistral Small, and Gemma were designed with distillation-aware training pipelines from the start. The goal: deliver GPT-4-class reasoning in a model small enough to run locally, privately, and cheaply.</p><p>And symbolic distillation — transferring knowledge as structured text rather than as neural activations — is opening entirely new territory, allowing language model intelligence to flow into specialised domain models that do not even share the same architecture.</p><hr><h2 id="a-practical-starting-point">A Practical Starting Point</h2><p>If you want to experiment with knowledge distillation today:</p><p><strong>For response-based KD in PyTorch</strong>, the training loop change is minimal — replace your standard cross-entropy loss with the blended loss described above and pass the teacher&rsquo;s logits alongside the hard labels.</p><p><strong>For NLP tasks</strong>, Hugging Face&rsquo;s<code>transformers</code> library includes DistilBERT as a reference distilled model with its training recipe documented.</p><p><strong>For vision</strong>, TorchVision&rsquo;s knowledge distillation tutorial is the fastest on-ramp.</p><p>The key design decisions are: the temperature T (start at 4), the blending weight α (start at 0.5), and whether you need feature-based or response-based transfer (response-based first, feature-based if accuracy is still insufficient).</p><hr><p>The master-apprentice metaphor is more than decorative. Knowledge distillation encodes a genuine pedagogical insight: that the richer the guidance a learner receives, the more efficiently it reaches competence. The hard labels of raw data are the equivalent of telling a student the answer. The soft targets of a teacher model are the equivalent of showing them how to think.</p><p>That distinction — answer versus thinking — is what makes knowledge distillation one of the most elegant ideas in modern machine learning.</p>
]]></content:encoded><media:content url="https://curiousbit.netlify.app/images/kd-master-apprentice.jpg" medium="image"><media:title type="plain">Machine-Learning</media:title></media:content><category>artificial-intelligence</category><category>machine-learning</category><category>model-compression</category><category>deep-learning</category><category>Knowledge Base</category></item><item><title>LLM &amp; Embeddings — One Predicts Words. One Maps Meaning.</title><link>https://curiousbit.netlify.app/one-predicts-words-one-maps-meaning/</link><guid isPermaLink="true">https://curiousbit.netlify.app/one-predicts-words-one-maps-meaning/</guid><pubDate>Tue, 03 Mar 2026 00:00:00 +0000</pubDate><dc:creator>Ajay Walia</dc:creator><description>&lt;style&gt;
.two-mech { margin: 2rem 0 2.5rem; border-radius: 14px; overflow: hidden; border: 1px solid #1f3358; background: #0a1424; }
.two-mech svg { display: block; width: 100%; height: auto; min-width: 720px; }
.two-mech-wrap { overflow-x: auto; }
@media (prefers-reduced-motion: reduce) { .two-mech .tm-particle { display: none; } }
&lt;/style&gt;
&lt;div class="two-mech"&gt;&lt;div class="two-mech-wrap"&gt;
&lt;svg viewBox="0 0 1200 720" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Animated diagram contrasting the LLM stochastic generation loop with the deterministic embedding similarity pipeline"&gt;
&lt;defs&gt;
&lt;filter id="tmGlow" x="-50%" y="-50%" width="200%" height="200%"&gt;
&lt;feGaussianBlur stdDeviation="3" result="b"/&gt;
&lt;feMerge&gt;&lt;feMergeNode in="b"/&gt;&lt;feMergeNode in="SourceGraphic"/&gt;&lt;/feMerge&gt;
&lt;/filter&gt;
&lt;marker id="tmArrowBlue" markerWidth="10" markerHeight="10" refX="6" refY="5" orient="auto"&gt;
&lt;path d="M0,0 L10,5 L0,10 Z" fill="#60a5fa"/&gt;
&lt;/marker&gt;
&lt;marker id="tmArrowGreen" markerWidth="10" markerHeight="10" refX="6" refY="5" orient="auto"&gt;
&lt;path d="M0,0 L10,5 L0,10 Z" fill="#34d399"/&gt;
&lt;/marker&gt;
&lt;marker id="tmArrowAmber" markerWidth="10" markerHeight="10" refX="6" refY="5" orient="auto"&gt;
&lt;path d="M0,0 L10,5 L0,10 Z" fill="#f59e0b"/&gt;
&lt;/marker&gt;
&lt;!-- Hidden paths the particles animate along --&gt;
&lt;path id="tmPathLLM"
d="M 140 240 L 320 240 L 500 240 L 680 240 L 860 240
L 860 320 Q 860 360 820 360 L 540 360 Q 500 360 500 320 L 500 280
L 500 240 L 680 240 L 860 240"
fill="none" stroke="none"/&gt;
&lt;path id="tmPathEmb"
d="M 140 540 L 320 540 L 500 540 L 680 540 L 860 540"
fill="none" stroke="none"/&gt;
&lt;path id="tmPathEmb2"
d="M 140 540 L 320 540 L 500 540 L 680 540 L 860 540"
fill="none" stroke="none"/&gt;
&lt;/defs&gt;
&lt;!-- background --&gt;
&lt;rect width="1200" height="720" fill="#0a1424"/&gt;
&lt;!-- header --&gt;
&lt;text x="120" y="60" font-family="'Space Grotesk','Inter',sans-serif" font-size="16" fill="#f59e0b" letter-spacing="3" font-weight="700"&gt;TWO MECHANISMS&lt;/text&gt;
&lt;text x="120" y="100" font-family="'Space Grotesk','Inter',sans-serif" font-size="38" fill="#ffffff" font-weight="700" letter-spacing="-.5"&gt;Generation vs Similarity&lt;/text&gt;
&lt;!-- ════════ LLM PIPELINE (top) ════════ --&gt;
&lt;text x="120" y="180" font-family="'Space Grotesk','Inter',sans-serif" font-size="26" fill="#60a5fa" font-weight="700"&gt;LLM&lt;/text&gt;
&lt;text x="120" y="208" font-family="'Inter',sans-serif" font-size="14" fill="#7e95b5" font-style="italic"&gt;non-deterministic · sampling&lt;/text&gt;
&lt;!-- LLM boxes --&gt;
&lt;g font-family="'Inter',sans-serif"&gt;
&lt;!-- prompt --&gt;
&lt;g&gt;
&lt;rect x="60" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/&gt;
&lt;text x="140" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;prompt&lt;/text&gt;
&lt;text x="140" y="258" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;"AI is..."&lt;/text&gt;
&lt;/g&gt;
&lt;!-- tokenize --&gt;
&lt;g&gt;
&lt;rect x="240" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/&gt;
&lt;text x="320" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;tokenize&lt;/text&gt;
&lt;text x="320" y="258" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;BPE → IDs&lt;/text&gt;
&lt;/g&gt;
&lt;!-- model --&gt;
&lt;g&gt;
&lt;rect x="420" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/&gt;
&lt;text x="500" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;model&lt;/text&gt;
&lt;text x="500" y="258" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;forward pass&lt;/text&gt;
&lt;/g&gt;
&lt;!-- sample --&gt;
&lt;g&gt;
&lt;rect x="600" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/&gt;
&lt;text x="680" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;sample&lt;/text&gt;
&lt;text x="680" y="258" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;temp / top-p&lt;/text&gt;
&lt;/g&gt;
&lt;!-- next token --&gt;
&lt;g&gt;
&lt;rect x="780" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/&gt;
&lt;text x="860" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;next token&lt;/text&gt;
&lt;text x="860" y="258" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;append, loop&lt;/text&gt;
&lt;/g&gt;
&lt;/g&gt;
&lt;!-- LLM arrows --&gt;
&lt;g stroke="#60a5fa" stroke-width="2" fill="none"&gt;
&lt;line x1="220" y1="240" x2="232" y2="240" marker-end="url(#tmArrowBlue)"/&gt;
&lt;line x1="400" y1="240" x2="412" y2="240" marker-end="url(#tmArrowBlue)"/&gt;
&lt;line x1="580" y1="240" x2="592" y2="240" marker-end="url(#tmArrowBlue)"/&gt;
&lt;line x1="760" y1="240" x2="772" y2="240" marker-end="url(#tmArrowBlue)"/&gt;
&lt;/g&gt;
&lt;!-- Stochastic loop arrow (next token → model) --&gt;
&lt;path d="M 860 280 L 860 340 Q 860 360 840 360 L 520 360 Q 500 360 500 340 L 500 286"
fill="none" stroke="#f59e0b" stroke-width="2" stroke-dasharray="5,4" marker-end="url(#tmArrowAmber)"/&gt;
&lt;text x="680" y="384" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#f59e0b" font-style="italic"&gt;stochastic loop&lt;/text&gt;
&lt;!-- Pulsing rings on each LLM box (highlight as particle passes) --&gt;
&lt;rect class="tm-box-pulse" x="60" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="5s" begin="0s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;rect class="tm-box-pulse" x="240" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="5s" begin="0.6s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;rect class="tm-box-pulse" x="420" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="5s" begin="1.2s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;rect class="tm-box-pulse" x="600" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="5s" begin="1.8s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;rect class="tm-box-pulse" x="780" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="5s" begin="2.4s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;!-- Animated particle on the LLM path (loops continuously, traverses stochastic loop) --&gt;
&lt;circle class="tm-particle" r="9" fill="#60a5fa" filter="url(#tmGlow)"&gt;
&lt;animateMotion dur="5s" repeatCount="indefinite" rotate="auto"&gt;
&lt;mpath href="#tmPathLLM"/&gt;
&lt;/animateMotion&gt;
&lt;/circle&gt;
&lt;!-- ════════ EMBEDDING PIPELINE (bottom) ════════ --&gt;
&lt;text x="120" y="480" font-family="'Space Grotesk','Inter',sans-serif" font-size="26" fill="#34d399" font-weight="700"&gt;EMBEDDING&lt;/text&gt;
&lt;text x="120" y="508" font-family="'Inter',sans-serif" font-size="14" fill="#7e95b5" font-style="italic"&gt;deterministic · geometric&lt;/text&gt;
&lt;g font-family="'Inter',sans-serif"&gt;
&lt;!-- word --&gt;
&lt;g&gt;
&lt;rect x="60" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/&gt;
&lt;text x="140" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;word&lt;/text&gt;
&lt;text x="140" y="558" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;"king"&lt;/text&gt;
&lt;/g&gt;
&lt;!-- lookup --&gt;
&lt;g&gt;
&lt;rect x="240" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/&gt;
&lt;text x="320" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;lookup&lt;/text&gt;
&lt;text x="320" y="558" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;GloVe / SBERT&lt;/text&gt;
&lt;/g&gt;
&lt;!-- vector --&gt;
&lt;g&gt;
&lt;rect x="420" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/&gt;
&lt;text x="500" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;vector&lt;/text&gt;
&lt;text x="500" y="558" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;fixed dim&lt;/text&gt;
&lt;/g&gt;
&lt;!-- cosine --&gt;
&lt;g&gt;
&lt;rect x="600" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/&gt;
&lt;text x="680" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;cosine&lt;/text&gt;
&lt;text x="680" y="558" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;vs corpus&lt;/text&gt;
&lt;/g&gt;
&lt;!-- similarity --&gt;
&lt;g&gt;
&lt;rect x="780" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/&gt;
&lt;text x="860" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700"&gt;similarity&lt;/text&gt;
&lt;text x="860" y="558" text-anchor="middle" font-size="13" fill="#7e95b5"&gt;score · rank&lt;/text&gt;
&lt;/g&gt;
&lt;/g&gt;
&lt;g stroke="#34d399" stroke-width="2" fill="none"&gt;
&lt;line x1="220" y1="540" x2="232" y2="540" marker-end="url(#tmArrowGreen)"/&gt;
&lt;line x1="400" y1="540" x2="412" y2="540" marker-end="url(#tmArrowGreen)"/&gt;
&lt;line x1="580" y1="540" x2="592" y2="540" marker-end="url(#tmArrowGreen)"/&gt;
&lt;line x1="760" y1="540" x2="772" y2="540" marker-end="url(#tmArrowGreen)"/&gt;
&lt;/g&gt;
&lt;!-- Pulsing rings on each Embedding box --&gt;
&lt;rect class="tm-box-pulse" x="60" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="4s" begin="0s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;rect class="tm-box-pulse" x="240" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="4s" begin="0.7s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;rect class="tm-box-pulse" x="420" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="4s" begin="1.4s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;rect class="tm-box-pulse" x="600" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="4s" begin="2.1s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;rect class="tm-box-pulse" x="780" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"&gt;
&lt;animate attributeName="opacity" values="0;1;0" dur="4s" begin="2.8s" repeatCount="indefinite"/&gt;
&lt;/rect&gt;
&lt;!-- Embedding particle (one-shot left-to-right, restarts cleanly) --&gt;
&lt;circle class="tm-particle" r="9" fill="#34d399" filter="url(#tmGlow)"&gt;
&lt;animateMotion dur="4s" repeatCount="indefinite" rotate="auto"&gt;
&lt;mpath href="#tmPathEmb"/&gt;
&lt;/animateMotion&gt;
&lt;/circle&gt;
&lt;!-- Annotation under embedding to underscore "one-shot, no loop" --&gt;
&lt;text x="500" y="624" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#34d399" font-style="italic"&gt;one-shot · same input always produces same output&lt;/text&gt;
&lt;!-- Bottom rule + caption --&gt;
&lt;line x1="60" y1="660" x2="940" y2="660" stroke="#1f3358" stroke-width="1"/&gt;
&lt;text x="500" y="690" text-anchor="middle" font-family="'Inter',sans-serif" font-size="14" fill="#7e95b5" font-style="italic"&gt;Both paths exist in every modern NLP system. Which one you reach for depends on whether the answer needs to be &lt;tspan fill="#60a5fa" font-weight="700"&gt;written&lt;/tspan&gt; or &lt;tspan fill="#34d399" font-weight="700"&gt;found&lt;/tspan&gt;.&lt;/text&gt;
&lt;/svg&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The model is what writes the email. The embedding is what finds the one you wrote last March.&lt;/p&gt;</description><content:encoded>&lt;![CDATA[<img src="https://curiousbit.netlify.app/images/IITM/week6-mechanisms.png" alt="Machine-Learning" style="max-width:100%;height:auto;margin-bottom:1.5em;"/><style>
.two-mech { margin: 2rem 0 2.5rem; border-radius: 14px; overflow: hidden; border: 1px solid #1f3358; background: #0a1424; }
.two-mech svg { display: block; width: 100%; height: auto; min-width: 720px; }
.two-mech-wrap { overflow-x: auto; }
@media (prefers-reduced-motion: reduce) { .two-mech .tm-particle { display: none; } }</style><div class="two-mech"><div class="two-mech-wrap"><svg viewBox="0 0 1200 720" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Animated diagram contrasting the LLM stochastic generation loop with the deterministic embedding similarity pipeline"><defs><filter id="tmGlow" x="-50%" y="-50%" width="200%" height="200%"><feGaussianBlur stdDeviation="3" result="b"/><feMerge><feMergeNode in="b"/><feMergeNode in="SourceGraphic"/></feMerge></filter><marker id="tmArrowBlue" markerWidth="10" markerHeight="10" refX="6" refY="5" orient="auto"><path d="M0,0 L10,5 L0,10 Z" fill="#60a5fa"/></marker><marker id="tmArrowGreen" markerWidth="10" markerHeight="10" refX="6" refY="5" orient="auto"><path d="M0,0 L10,5 L0,10 Z" fill="#34d399"/></marker><marker id="tmArrowAmber" markerWidth="10" markerHeight="10" refX="6" refY="5" orient="auto"><path d="M0,0 L10,5 L0,10 Z" fill="#f59e0b"/></marker><path id="tmPathLLM" d="M 140 240 L 320 240 L 500 240 L 680 240 L 860 240              L 860 320 Q 860 360 820 360 L 540 360 Q 500 360 500 320 L 500 280              L 500 240 L 680 240 L 860 240" fill="none" stroke="none"/><path id="tmPathEmb" d="M 140 540 L 320 540 L 500 540 L 680 540 L 860 540" fill="none" stroke="none"/><path id="tmPathEmb2" d="M 140 540 L 320 540 L 500 540 L 680 540 L 860 540" fill="none" stroke="none"/></defs><rect width="1200" height="720" fill="#0a1424"/><text x="120" y="60" font-family="'Space Grotesk','Inter',sans-serif" font-size="16" fill="#f59e0b" letter-spacing="3" font-weight="700">TWO MECHANISMS</text><text x="120" y="100" font-family="'Space Grotesk','Inter',sans-serif" font-size="38" fill="#ffffff" font-weight="700" letter-spacing="-.5">Generation vs Similarity</text><text x="120" y="180" font-family="'Space Grotesk','Inter',sans-serif" font-size="26" fill="#60a5fa" font-weight="700">LLM</text><text x="120" y="208" font-family="'Inter',sans-serif" font-size="14" fill="#7e95b5" font-style="italic">non-deterministic · sampling</text><g font-family="'Inter',sans-serif"><g><rect x="60" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/><text x="140" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">prompt</text><text x="140" y="258" text-anchor="middle" font-size="13" fill="#7e95b5">"AI is..."</text></g><g><rect x="240" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/><text x="320" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">tokenize</text><text x="320" y="258" text-anchor="middle" font-size="13" fill="#7e95b5">BPE → IDs</text></g><g><rect x="420" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/><text x="500" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">model</text><text x="500" y="258" text-anchor="middle" font-size="13" fill="#7e95b5">forward pass</text></g><g><rect x="600" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/><text x="680" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">sample</text><text x="680" y="258" text-anchor="middle" font-size="13" fill="#7e95b5">temp / top-p</text></g><g><rect x="780" y="200" width="160" height="80" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="2"/><text x="860" y="234" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">next token</text><text x="860" y="258" text-anchor="middle" font-size="13" fill="#7e95b5">append, loop</text></g></g><g stroke="#60a5fa" stroke-width="2" fill="none"><line x1="220" y1="240" x2="232" y2="240" marker-end="url(#tmArrowBlue)"/><line x1="400" y1="240" x2="412" y2="240" marker-end="url(#tmArrowBlue)"/><line x1="580" y1="240" x2="592" y2="240" marker-end="url(#tmArrowBlue)"/><line x1="760" y1="240" x2="772" y2="240" marker-end="url(#tmArrowBlue)"/></g><path d="M 860 280 L 860 340 Q 860 360 840 360 L 520 360 Q 500 360 500 340 L 500 286" fill="none" stroke="#f59e0b" stroke-width="2" stroke-dasharray="5,4" marker-end="url(#tmArrowAmber)"/><text x="680" y="384" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#f59e0b" font-style="italic">stochastic loop</text><rect class="tm-box-pulse" x="60" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="5s" begin="0s" repeatCount="indefinite"/></rect><rect class="tm-box-pulse" x="240" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="5s" begin="0.6s" repeatCount="indefinite"/></rect><rect class="tm-box-pulse" x="420" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="5s" begin="1.2s" repeatCount="indefinite"/></rect><rect class="tm-box-pulse" x="600" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="5s" begin="1.8s" repeatCount="indefinite"/></rect><rect class="tm-box-pulse" x="780" y="200" width="160" height="80" rx="10" fill="none" stroke="#60a5fa" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="5s" begin="2.4s" repeatCount="indefinite"/></rect><circle class="tm-particle" r="9" fill="#60a5fa" filter="url(#tmGlow)"><animateMotion dur="5s" repeatCount="indefinite" rotate="auto"><mpath href="#tmPathLLM"/></animateMotion></circle><text x="120" y="480" font-family="'Space Grotesk','Inter',sans-serif" font-size="26" fill="#34d399" font-weight="700">EMBEDDING</text><text x="120" y="508" font-family="'Inter',sans-serif" font-size="14" fill="#7e95b5" font-style="italic">deterministic · geometric</text><g font-family="'Inter',sans-serif"><g><rect x="60" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/><text x="140" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">word</text><text x="140" y="558" text-anchor="middle" font-size="13" fill="#7e95b5">"king"</text></g><g><rect x="240" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/><text x="320" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">lookup</text><text x="320" y="558" text-anchor="middle" font-size="13" fill="#7e95b5">GloVe / SBERT</text></g><g><rect x="420" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/><text x="500" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">vector</text><text x="500" y="558" text-anchor="middle" font-size="13" fill="#7e95b5">fixed dim</text></g><g><rect x="600" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/><text x="680" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">cosine</text><text x="680" y="558" text-anchor="middle" font-size="13" fill="#7e95b5">vs corpus</text></g><g><rect x="780" y="500" width="160" height="80" rx="10" fill="#0f1d33" stroke="#10b981" stroke-width="2"/><text x="860" y="534" text-anchor="middle" font-size="17" fill="#ffffff" font-weight="700">similarity</text><text x="860" y="558" text-anchor="middle" font-size="13" fill="#7e95b5">score · rank</text></g></g><g stroke="#34d399" stroke-width="2" fill="none"><line x1="220" y1="540" x2="232" y2="540" marker-end="url(#tmArrowGreen)"/><line x1="400" y1="540" x2="412" y2="540" marker-end="url(#tmArrowGreen)"/><line x1="580" y1="540" x2="592" y2="540" marker-end="url(#tmArrowGreen)"/><line x1="760" y1="540" x2="772" y2="540" marker-end="url(#tmArrowGreen)"/></g><rect class="tm-box-pulse" x="60" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="4s" begin="0s" repeatCount="indefinite"/></rect><rect class="tm-box-pulse" x="240" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="4s" begin="0.7s" repeatCount="indefinite"/></rect><rect class="tm-box-pulse" x="420" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="4s" begin="1.4s" repeatCount="indefinite"/></rect><rect class="tm-box-pulse" x="600" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="4s" begin="2.1s" repeatCount="indefinite"/></rect><rect class="tm-box-pulse" x="780" y="500" width="160" height="80" rx="10" fill="none" stroke="#34d399" stroke-width="3" opacity="0"><animate attributeName="opacity" values="0;1;0" dur="4s" begin="2.8s" repeatCount="indefinite"/></rect><circle class="tm-particle" r="9" fill="#34d399" filter="url(#tmGlow)"><animateMotion dur="4s" repeatCount="indefinite" rotate="auto"><mpath href="#tmPathEmb"/></animateMotion></circle><text x="500" y="624" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#34d399" font-style="italic">one-shot · same input always produces same output</text><line x1="60" y1="660" x2="940" y2="660" stroke="#1f3358" stroke-width="1"/><text x="500" y="690" text-anchor="middle" font-family="'Inter',sans-serif" font-size="14" fill="#7e95b5" font-style="italic">Both paths exist in every modern NLP system. Which one you reach for depends on whether the answer needs to be<tspan fill="#60a5fa" font-weight="700">written</tspan> or<tspan fill="#34d399" font-weight="700">found</tspan>.</text></svg></div></div><p>The model is what writes the email. The embedding is what finds the one you wrote last March.</p><p>Most modern AI systems are built from two fundamentally different mechanisms, and most confusion about what AI &ldquo;is&rdquo; comes from conflating them. LLMs are<em>generative</em>: tokens in, tokens out, with the output shaped by the prompt and the sampling settings, varying every time you ask. Embeddings are<em>geometric</em>: a deterministic mapping from a word or sentence to a fixed vector, where comparisons are positional and identical input always produces identical output. Both are essential. Both are old enough to be uncontroversial. Most useful systems combine them.</p><p>What follows is the<strong>Week 6 Graded Mini Project</strong> of the<strong>IITM Pravartak Professional Certificate Programme in Agentic AI and Applications</strong>, used here as a lens for both mechanisms across five hands-on exercises.</p><h2 id="the-two-paths-side-by-side">The two paths, side by side</h2><p>The header image above shows the contrast in one frame. The LLM path is a loop with sampling — non-deterministic by design, behaviour controlled by temperature, top-p, and prompt structure. The embedding path is a one-shot lookup followed by a geometric comparison — deterministic, fast, stable.</p><p>That single distinction tells you which mechanism to reach for. If the answer needs to be written, generated, synthesized, or improvised, you want the LLM. If the answer needs to be found, ranked, deduplicated, clustered, or routed, you want embeddings. Most production systems use both because most real problems are some combination of &ldquo;find the right context&rdquo; and &ldquo;say something useful about it.&rdquo;</p><p>A quick decision table to anchor the rest of the article:</p><table><thead><tr><th>Problem</th><th>Reach for</th></tr></thead><tbody><tr><td>Semantic search over a corpus</td><td>Embeddings</td></tr><tr><td>Conversational reply or text drafting</td><td>LLM</td></tr><tr><td>Near-duplicate detection or content clustering</td><td>Embeddings</td></tr><tr><td>Summarization of a long document</td><td>LLM</td></tr><tr><td>Routing a support ticket to the right team</td><td>Embeddings + a small classifier head</td></tr><tr><td>Question answering grounded in your docs</td><td>Both (RAG)</td></tr><tr><td>Image or text classification</td><td>Embeddings + a categorical head</td></tr><tr><td>Translation, rewriting, code generation</td><td>LLM</td></tr></tbody></table><p>The exercises below show why each row works the way it does.</p><h2 id="exercise-1-text-generation-reveals-prompt-and-sampling-sensitivity">Exercise 1: Text generation reveals prompt and sampling sensitivity</h2><p>Section A1 loaded<code>distilgpt2</code> through the Hugging Face<code>pipeline</code> API and generated three continuations of the same prompt:</p><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">generator</span><span class="o">=</span><span class="n">pipeline</span><span class="p">(</span><span class="s2">"text-generation"</span><span class="p">,</span><span class="n">model</span><span class="o">=</span><span class="s2">"distilgpt2"</span><span class="p">)</span></span></span><span class="line"><span class="cl"><span class="n">generator</span><span class="p">(</span><span class="s2">"AI is transforming industries by"</span><span class="p">,</span></span></span><span class="line"><span class="cl"><span class="n">max_new_tokens</span><span class="o">=</span><span class="mi">40</span><span class="p">,</span><span class="n">num_return_sequences</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span><span class="n">do_sample</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></span></span></code></pre></div><p>Three continuations came back from the same model, the same prompt, the same call:</p><blockquote><p><em>&ldquo;AI is transforming industries by using science to bring people together with a greater understanding of the importance of science. The new book takes an approach to both science and technology, allowing people to focus more more effectively on the basics and to&hellip;&rdquo;</em></p></blockquote><blockquote><p><em>&ldquo;AI is transforming industries by replacing the manufacturing sector with a manufacturing sector that can be turned into a manufacturing and IT sector by creating new jobs and creating new jobs. The new jobs and investment in the next decade will help spur growth&hellip;&rdquo;</em></p></blockquote><blockquote><p><em>&ldquo;AI is transforming industries by creating a new, faster, and more attractive way of generating capital and creating jobs for both the United States and Europe. This is an effective new way of doing this.&rdquo;</em></p></blockquote><p>Three different stories. None of which the model &ldquo;knew&rdquo; — it just produced plausible-sounding next tokens under stochastic sampling. Notice the repetitions (&ldquo;manufacturing sector with a manufacturing sector&rdquo;), the loops (&ldquo;more more effectively&rdquo;), the empty filler (&ldquo;a new, faster, and more attractive way of generating capital&rdquo;). DistilGPT-2 is a small model — these are the artefacts of a system that&rsquo;s good at local fluency but doesn&rsquo;t have a strong forward plan.</p><p>The headline insight: LLM outputs are statistical, prompt-sensitive, and unrepeatable unless you fix the seed. The same prompt can give you variety (a feature when brainstorming) or drift (a bug when consistency matters).</p><h2 id="exercise-2-tokenization-is-where-the-abstraction-begins">Exercise 2: Tokenization is where the abstraction begins</h2><p>This is the section to slow down on. Take the sentence:</p><blockquote><p><em>&ldquo;LLMs are powerful tools for natural language understanding.&rdquo;</em></p></blockquote><p>A human reads eight words. The model sees ten tokens.</p><p><img src="/images/IITM/week6-tokens.png" alt="BPE tokenization of the sentence, showing each token as a coloured pill"/><p>After BPE (Byte-Pair Encoding) with the DistilGPT-2 tokenizer:</p><pre tabindex="0"><code>['LL', 'Ms', 'Ġare', 'Ġpowerful', 'Ġtools', 'Ġfor', 'Ġnatural',
'Ġlanguage', 'Ġunderstanding', '.']</code></pre><p>The string<code>LLMs</code> doesn&rsquo;t appear in the model&rsquo;s vocabulary as a single unit, so it is split into<code>LL</code> and<code>Ms</code>. The<code>Ġ</code> prefix encodes &ldquo;preceding space&rdquo; — that&rsquo;s how BPE preserves word boundaries without a separator character. The period gets its own token.</p><p>The mismatch between<em>what a human reads</em> and<em>what the model processes</em> has real consequences:</p><ul><li><strong>Cost is per token, not per word.</strong> API billing, latency, and rate limits are all token-denominated. A 1,000-word prompt to a frontier model may bill at 1,300–1,500 tokens depending on language.</li><li><strong>Context windows are token windows.</strong> A 4,096-token context holds roughly 3,000 English words. Much less for code (whitespace and symbols inflate counts), much less again for languages with poor vocabulary coverage in the tokenizer.</li><li><strong>Rare strings behave oddly.</strong> Brand names, technical acronyms, foreign words, internal jargon — anything outside the trained vocabulary gets fractured. Model behaviour around those fractures is harder to predict, and prompt sensitivity often hides at this layer.</li><li><strong>The same string can tokenize differently with leading whitespace.</strong><code>"king"</code> and<code>" king"</code> are different token sequences. That&rsquo;s why pasted prompts sometimes produce subtly different outputs than typed ones.</li></ul><p>Tokenization is the lowest layer of the LLM stack and the one most engineering conversations skip. If you&rsquo;re tuning prompts and getting unstable behaviour, the first place to look is what your input looks like<em>after the tokenizer touches it</em>, not what it looks like in your editor.</p><h2 id="exercise-3-prompts-shape-what-you-get">Exercise 3: Prompts shape what you get</h2><p>Section B ran three task-shaped prompts through the same generator, with<code>temperature=0.8</code> and<code>top_p=0.95</code>:</p><ul><li><strong>Summarization</strong> — explicit instruction with a 30-word cap.</li><li><strong>Q&amp;A</strong> — structured format with<code>Q:</code> and<code>A:</code> markers.</li><li><strong>Creative</strong> — open-ended request for a 4-line poem about AI.</li></ul><p>The summarization output respected the spirit of the constraint but drifted past 30 words on most runs — DistilGPT-2 is small enough that hard length control isn&rsquo;t reliable even with explicit instructions. The Q&amp;A output, asked for the capital of Japan, returned<code>I believe...</code> — the model hedged. A larger model would say Tokyo confidently; a small model produces statistically plausible Q&amp;A-shaped text without strong factual grounding. The creative prompt produced varied and stylistic continuations, but with the lowest grounding: fluency over precision.</p><p>Structure compresses the output space the model is sampling from. Vagueness expands it. That single sentence is most of what &ldquo;prompt engineering&rdquo; actually is — the rest is technique.</p><h2 id="exercise-4-word-embeddings-encode-semantic-geometry">Exercise 4: Word embeddings encode semantic geometry</h2><p>Pivot to the other mechanism. Section C1 loaded<strong>GloVe</strong> vectors (<code>glove-wiki-gigaword-50</code> — 50 dimensions, trained on Wikipedia and Gigaword) via Gensim, then asked for the five nearest neighbours of three words:</p><table><thead><tr><th>Query</th><th>Top 5 neighbours (cosine similarity)</th></tr></thead><tbody><tr><td><code>king</code></td><td>prince (0.82), queen (0.78), ii (0.77), emperor (0.77), son (0.77)</td></tr><tr><td><code>queen</code></td><td>princess (0.85), lady (0.81), elizabeth (0.79), king (0.78), prince (0.78)</td></tr><tr><td><code>diamond</code></td><td>gold (0.77), diamonds (0.77), gem (0.74), silver (0.72), jewel (0.71)</td></tr></tbody></table><p>There is no generation here. Each word is mapped to a fixed 50-dimensional vector, and the &ldquo;nearest neighbours&rdquo; are the words whose vectors sit closest in that space by cosine similarity. The geometry was learned by training on co-occurrence — words that appear in similar contexts end up in similar positions. That&rsquo;s why<code>king</code> and<code>prince</code> are nearest neighbours, why<code>queen</code> pulls in<code>elizabeth</code> (the corpus has plenty of references to Queen Elizabeth), and why<code>diamond</code> cleanly resolves to a jewellery cluster.</p><p>The classic<code>king − man + woman ≈ queen</code> analogy works in this same space; the lab didn&rsquo;t run it, but the geometry is there. Embeddings don&rsquo;t<em>write</em> anything — they<em>place</em> things near other things. That single property is what makes them the backbone of semantic search, retrieval, deduplication, and recommendation.</p><h2 id="exercise-5-sentence-similarity-from-averaged-word-vectors">Exercise 5: Sentence similarity from averaged word vectors</h2><p>Section C2 extended the geometry to sentences. Five short sentences across two topics — AI/ML and jewellery — were averaged into sentence vectors (mean of their word vectors, with simple lowercase tokenization), then compared with cosine similarity.</p><p>Plotted in 2D via multidimensional scaling on the cosine distances, the clustering is unambiguous:</p><p><img src="/images/IITM/week6-clusters.png" alt="Two-dimensional cluster plot of the five sentence vectors, with the AI/ML sentences clearly separated from the jewellery sentences"/><p>The numerical version:</p><table><thead><tr><th/><th>AI/support</th><th>ML/fraud</th><th>Jewellery</th><th>Neural/medical</th><th>Luxury/rings</th></tr></thead><tbody><tr><td><strong>AI/support</strong></td><td>1.00</td><td>0.84</td><td>0.60</td><td>0.80</td><td>0.50</td></tr><tr><td><strong>ML/fraud</strong></td><td>0.84</td><td>1.00</td><td>0.73</td><td>0.83</td><td>0.62</td></tr><tr><td><strong>Jewellery</strong></td><td>0.60</td><td>0.73</td><td>1.00</td><td>0.58</td><td>0.88</td></tr><tr><td><strong>Neural/medical</strong></td><td>0.80</td><td>0.83</td><td>0.58</td><td>1.00</td><td>0.56</td></tr><tr><td><strong>Luxury/rings</strong></td><td>0.50</td><td>0.62</td><td>0.88</td><td>0.56</td><td>1.00</td></tr></tbody></table><p>Within-cluster pairs sit at 0.84–0.88. Cross-domain pairs sit at 0.50–0.62. The grouping is exactly what you&rsquo;d want a retrieval system to do.</p><p>Three caveats worth naming, because they explain why modern retrieval doesn&rsquo;t actually use GloVe averages:</p><ul><li><strong>Averaging discards word order.</strong> &ldquo;Dog bites man&rdquo; and &ldquo;man bites dog&rdquo; produce identical sentence vectors. For most retrieval that&rsquo;s tolerable; for anything where syntax carries the meaning, it isn&rsquo;t.</li><li><strong>Transformer encoders fixed this.</strong> Models like BERT, RoBERTa, and their descendants produce<em>contextual</em> embeddings — each token&rsquo;s vector depends on the tokens around it. Pool those across a sentence and you get a representation that respects word order and disambiguates polysemy.</li><li><strong>Sentence-BERT and friends made it production-grade.</strong> SBERT (and successors like OpenAI&rsquo;s<code>text-embedding-3</code>, Cohere&rsquo;s embeddings, Voyage, etc.) trained encoders specifically for sentence-level similarity. That&rsquo;s the difference between &ldquo;the demo works on five sentences&rdquo; and &ldquo;you can index a million documents and search them in milliseconds.&rdquo;</li></ul><p>GloVe averaging is a baseline. It&rsquo;s the right baseline to start with, because it lets you see the geometry without the architecture getting in the way. Production systems start from this picture and replace the lookup step.</p><h2 id="when-both-mechanisms-meet">When both mechanisms meet</h2><p>The final exercise sits at the intersection.<code>distilbert-base-uncased-finetuned-sst-2-english</code> is a transformer encoder (an embedding model under the hood) with a classification head fine-tuned for sentiment. Run it on three workplace-themed inputs:</p><table><thead><tr><th>Input</th><th>Label</th><th>Score</th></tr></thead><tbody><tr><td>&ldquo;The chatbot reduced ticket resolution time by 40% this quarter.&rdquo;</td><td>POSITIVE</td><td>0.9962</td></tr><tr><td>&ldquo;Our deployment failed repeatedly and customers were upset.&rdquo;</td><td>NEGATIVE</td><td>0.9997</td></tr><tr><td>&ldquo;The new recommendation engine is acceptable but needs tuning.&rdquo;</td><td>NEGATIVE</td><td>0.9898</td></tr></tbody></table><p>The third row is the interesting one, and it&rsquo;s worth unpacking because it points at a problem that turns up in every enterprise deployment of pretrained models.</p><p>&ldquo;Acceptable but needs tuning&rdquo; is, in workplace context, a<em>lukewarm-positive</em> — closer to &ldquo;approved with caveats&rdquo; than &ldquo;this is bad.&rdquo; The classifier scored it NEGATIVE with 0.9898 confidence. Three things are happening at once:</p><ul><li><strong>Domain mismatch.</strong> The model was fine-tuned on SST-2, which is movie reviews. &ldquo;Needs tuning&rdquo; reads negative there. In an engineering team&rsquo;s language, &ldquo;needs tuning&rdquo; is constructive — the same words have different sentiment loadings in different domains.</li><li><strong>No calibration on workplace text.</strong> The score is 0.9898 — extreme confidence — for what should be a borderline case. Pretrained classifiers tend to be miscalibrated on out-of-distribution inputs: they&rsquo;re not just wrong, they&rsquo;re confidently wrong. Calibration techniques (temperature scaling, Platt scaling, conformal prediction) exist for exactly this.</li><li><strong>Weak supervision is the practical fix.</strong> When you can&rsquo;t fine-tune (no labelled data, no budget, no time), the durable answer is to treat the classifier as one signal among several — combine it with rules, keyword filters, or a second model — rather than trusting any single number above the threshold.</li></ul><p>Architecturally, the lesson generalises across all three Section D variants. Generation is &ldquo;embedding + decoder loop.&rdquo; Classification is &ldquo;embedding + categorical head.&rdquo; Retrieval is &ldquo;embedding + cosine.&rdquo; Same underlying mathematical object, different output shapes. The architectural choices around the embedding determine what the system does — and where it fails when you take it out of the domain it was trained on.</p><h2 id="closing-observations">Closing observations</h2><p>Three things that generalise beyond this lab.</p><p><strong>Tokenization is where most LLM cost and quirks actually originate.</strong> It&rsquo;s the lowest layer of the stack and the one most engineering conversations skip. If you&rsquo;re tuning prompts and getting unstable behaviour, the first place to look is what your input looks like after the tokenizer touches it.</p><p><strong>Embedding-based similarity is older, cheaper, and more deterministic than people remember.</strong> Before reaching for an LLM call to compare two pieces of text, embed them and compute cosine. It&rsquo;s milliseconds, free, and stable. A surprising fraction of &ldquo;AI features&rdquo; are really embedding lookups with a confidence threshold.</p><p><strong>Generation and similarity sit next to each other.</strong> They are not competitors. RAG is the obvious example — embeddings retrieve, the LLM generates the answer grounded in what was retrieved. The<a href="/rag-chatbot-for-the-github-rest-api/">Week 15 RAG chatbot post</a> is what these two mechanisms look like wired together for production.</p><p>One predicts words. One maps meaning. Knowing which one to reach for is most of the job.</p>
]]></content:encoded><media:content url="https://curiousbit.netlify.app/images/IITM/week6-mechanisms.png" medium="image"><media:title type="plain">Machine-Learning</media:title></media:content><category>artificial-intelligence</category><category>llm</category><category>machine-learning</category><category>engineering</category><category>Knowledge Base</category></item></channel></rss>