<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Ajay Walia</title><link>https://curiousbit.netlify.app/</link><description>Digital workplace, artificial intelligence, cloud, security, automation, and enterprise technology notes by Ajay Walia.</description><language>en-au</language><managingEditor>Ajay Walia</managingEditor><webMaster>Ajay Walia</webMaster><copyright>Copyright 2026 Ajay Walia</copyright><lastBuildDate>Sun, 21 Jun 2026 05:46:10 +0000</lastBuildDate><atom:link href="https://curiousbit.netlify.app/tags/agents/index.xml" rel="self" type="application/rss+xml"/><image><url>https://curiousbit.netlify.app/images/og-default.png</url><title>Ajay Walia</title><link>https://curiousbit.netlify.app/</link></image><item><title>When AI Agents Go Wrong — and How to Engineer Ones That Don't</title><link>https://curiousbit.netlify.app/when-ai-agents-go-wrong/</link><guid isPermaLink="true">https://curiousbit.netlify.app/when-ai-agents-go-wrong/</guid><pubDate>Fri, 19 Jun 2026 00:00:00 +0000</pubDate><dc:creator>Ajay Walia</dc:creator><description>&lt;style&gt;
.agw-fig { margin: 2.2rem 0 1rem; border-radius: 14px; overflow: hidden; border: 1px solid #1f3358; background: #0a1424; }
.agw-fig .agw-wrap { overflow-x: auto; }
.agw-fig svg { display: block; width: 100%; height: auto; min-width: 700px; }
.agw-cap { font-size: .9rem; opacity: .72; margin: .5rem 0 2.4rem; line-height: 1.6; font-style: italic; }
@media (prefers-reduced-motion: reduce) { .agw-fig .agw-anim { display: none; } }
&lt;/style&gt;
&lt;p&gt;Most of the AI conversation right now is about capability — what the next model can do. This project made me sit with the opposite question: what happens when these systems are &lt;em&gt;trusted&lt;/em&gt;, handed real decisions, and then get it wrong? That is the uncomfortable, less glamorous half of building with AI, and it is exactly where &amp;ldquo;responsible AI&amp;rdquo; stops being a slogan and starts being engineering.&lt;/p&gt;</description><content:encoded>&lt;![CDATA[<img src="https://curiousbit.netlify.app/images/agents-go-wrong/hero.jpg" alt="Agents" style="max-width:100%;height:auto;margin-bottom:1.5em;"/><style>
.agw-fig { margin: 2.2rem 0 1rem; border-radius: 14px; overflow: hidden; border: 1px solid #1f3358; background: #0a1424; }
.agw-fig .agw-wrap { overflow-x: auto; }
.agw-fig svg { display: block; width: 100%; height: auto; min-width: 700px; }
.agw-cap { font-size: .9rem; opacity: .72; margin: .5rem 0 2.4rem; line-height: 1.6; font-style: italic; }
@media (prefers-reduced-motion: reduce) { .agw-fig .agw-anim { display: none; } }</style><p>Most of the AI conversation right now is about capability — what the next model can do. This project made me sit with the opposite question: what happens when these systems are<em>trusted</em>, handed real decisions, and then get it wrong? That is the uncomfortable, less glamorous half of building with AI, and it is exactly where &ldquo;responsible AI&rdquo; stops being a slogan and starts being engineering.</p><p>The exercise had two halves. First, take a real-world AI failure apart and explain<em>why</em> it failed — not just that it did. Second, flip from critic to designer: pick a domain I know, imagine an AI agent operating in it, and design the guardrails that would keep it safe. Below is the thinking behind both, plus the two case studies and two domains I worked through.</p><h2 id="what-this-exercise-is-actually-teaching">What this exercise is actually teaching</h2><p>Strip away the assignment framing and there are four skills underneath it:</p><ul><li><strong>Explain how and why AI systems fail</strong>, using evidence rather than vibes. &ldquo;It was biased&rdquo; is a conclusion, not an analysis. The interesting part is the<em>mechanism</em>.</li><li><strong>Connect failures to ethics</strong> — fairness, accountability, transparency, safety. A technical bug becomes an ethical problem the moment it touches a real person.</li><li><strong>Propose realistic safeguards</strong>, not &ldquo;be careful&rdquo; platitudes. Audits, human review gates, logging, escalation paths — things you could actually ship.</li><li><strong>Balance autonomy against control.</strong> An agent that asks permission for everything is useless; one that asks for nothing is dangerous. Good design is about putting the human in the loop at the<em>right</em> moments.</li></ul><p>That last point is the heart of it. Every safeguard is really a decision about where autonomy ends and oversight begins.</p><div class="agw-fig"><div class="agw-wrap"><svg viewBox="0 0 1200 420" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="A spectrum from an over-cautious agent that asks permission for everything to a reckless agent that asks for nothing, with the safe design zone in the middle."><defs><linearGradient id="agwDial" x1="0" x2="1"><stop offset="0" stop-color="#60a5fa"/><stop offset="0.5" stop-color="#34d399"/><stop offset="1" stop-color="#f87171"/></linearGradient></defs><rect width="1200" height="420" fill="#0a1424"/><text x="80" y="64" font-family="'Space Grotesk','Inter',sans-serif" font-size="15" fill="#f59e0b" letter-spacing="3" font-weight="700">THE CORE TRADE-OFF</text><text x="80" y="104" font-family="'Space Grotesk','Inter',sans-serif" font-size="34" fill="#ffffff" font-weight="700" letter-spacing="-.5">Where autonomy ends, oversight begins</text><rect x="80" y="190" width="1040" height="16" rx="8" fill="url(#agwDial)"/><rect x="470" y="178" width="260" height="40" rx="20" fill="none" stroke="#34d399" stroke-width="2"/><text x="600" y="203" text-anchor="middle" font-family="'Inter',sans-serif" font-size="15" fill="#34d399" font-weight="700">human-in-the-loop zone</text><g class="agw-anim"><polygon points="600,150 590,176 610,176" fill="#ffffff"><animateTransform attributeName="transform" type="translate" values="-40 0; 40 0; -40 0" dur="6s" repeatCount="indefinite" calcMode="spline" keySplines="0.4 0 0.6 1; 0.4 0 0.6 1"/></polygon></g><text x="80" y="300" font-family="'Space Grotesk','Inter',sans-serif" font-size="20" fill="#60a5fa" font-weight="700">Asks for everything</text><text x="80" y="328" font-family="'Inter',sans-serif" font-size="15" fill="#7e95b5">Permission for every step.</text><text x="80" y="350" font-family="'Inter',sans-serif" font-size="15" fill="#7e95b5">Safe, but useless — no one</text><text x="80" y="372" font-family="'Inter',sans-serif" font-size="15" fill="#7e95b5">would actually use it.</text><text x="600" y="300" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="20" fill="#34d399" font-weight="700">Gated at the right moments</text><text x="600" y="328" text-anchor="middle" font-family="'Inter',sans-serif" font-size="15" fill="#9fb4cf">Autonomous on low-stakes work;</text><text x="600" y="350" text-anchor="middle" font-family="'Inter',sans-serif" font-size="15" fill="#9fb4cf">a human approves the decisions</text><text x="600" y="372" text-anchor="middle" font-family="'Inter',sans-serif" font-size="15" fill="#9fb4cf">that can actually hurt someone.</text><text x="1120" y="300" text-anchor="end" font-family="'Space Grotesk','Inter',sans-serif" font-size="20" fill="#f87171" font-weight="700">Asks for nothing</text><text x="1120" y="328" text-anchor="end" font-family="'Inter',sans-serif" font-size="15" fill="#7e95b5">Acts without review.</text><text x="1120" y="350" text-anchor="end" font-family="'Inter',sans-serif" font-size="15" fill="#7e95b5">Fast and convenient, until</text><text x="1120" y="372" text-anchor="end" font-family="'Inter',sans-serif" font-size="15" fill="#7e95b5">a wrong answer acts on its own.</text></svg></div></div><p class="agw-cap">Every guardrail in this post is really a choice about where on this dial an agent should sit — and the right answer changes with the stakes of each decision.</p><h2 id="part-1--reading-the-autopsy-of-a-failure">Part 1 — Reading the autopsy of a failure</h2><p>I looked at two failures that fail in completely different ways. One is a<em>bias</em> problem baked into the data; the other is a<em>hallucination and accountability</em> problem baked into deployment. Putting them side by side is the most useful thing I took from this.</p><h3 id="case-a--compas-bias-that-hides-inside-neutral-math">Case A — COMPAS: bias that hides inside &ldquo;neutral&rdquo; math</h3><p>COMPAS is a risk-assessment tool used in US courts to score how likely a defendant is to reoffend. Judges used those scores to help inform bail and sentencing. In 2016, ProPublica analysed more than 7,000 cases in Broward County, Florida, and found something damning: among defendants who did<em>not</em> go on to reoffend, Black defendants were flagged &ldquo;high risk&rdquo; at roughly twice the rate of white defendants (about 45% versus 23%). The errors weren&rsquo;t random — they leaned in one direction.</p><p>Here&rsquo;s the part that took me a moment to appreciate.<strong>Race was never an input.</strong> The model didn&rsquo;t need it. It learned from historical criminal-justice data shaped by biased policing, and its questionnaire leaned on<em>proxies</em> — prior arrests, employment, neighbourhood, family history — that quietly correlate with race. The bias didn&rsquo;t enter through a checkbox; it seeped in through the data.</p><div class="agw-fig"><div class="agw-wrap"><svg viewBox="0 0 1200 620" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Diagram showing that race is never an input to COMPAS, but proxy features that correlate with race flow into the model and produce a skewed risk score."><defs><marker id="agwArrB" markerWidth="10" markerHeight="10" refX="7" refY="5" orient="auto"><path d="M0,0 L10,5 L0,10 Z" fill="#60a5fa"/></marker><marker id="agwArrR" markerWidth="10" markerHeight="10" refX="7" refY="5" orient="auto"><path d="M0,0 L10,5 L0,10 Z" fill="#f87171"/></marker></defs><rect width="1200" height="620" fill="#0a1424"/><text x="80" y="60" font-family="'Space Grotesk','Inter',sans-serif" font-size="15" fill="#f59e0b" letter-spacing="3" font-weight="700">CASE A · COMPAS</text><text x="80" y="100" font-family="'Space Grotesk','Inter',sans-serif" font-size="32" fill="#ffffff" font-weight="700" letter-spacing="-.5">Bias enters through proxies, not a checkbox</text><g><rect x="80" y="150" width="230" height="70" rx="12" fill="#1a1424" stroke="#f87171" stroke-width="2" stroke-dasharray="6 5"/><text x="195" y="184" text-anchor="middle" font-family="'Inter',sans-serif" font-size="17" fill="#f87171" font-weight="700">Race</text><text x="195" y="206" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#f0a0a0">never an input ✕</text></g><text x="80" y="280" font-family="'Inter',sans-serif" font-size="14" fill="#7e95b5" letter-spacing="2" font-weight="700">PROXY FEATURES (the questionnaire)</text><g font-family="'Inter',sans-serif" font-size="15" fill="#dce7f5" font-weight="600"><g><rect x="80" y="300" width="230" height="54" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="1.6"/><text x="195" y="333" text-anchor="middle">Prior arrests</text></g><g><rect x="80" y="368" width="230" height="54" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="1.6"/><text x="195" y="401" text-anchor="middle">Employment status</text></g><g><rect x="80" y="436" width="230" height="54" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="1.6"/><text x="195" y="469" text-anchor="middle">Neighbourhood</text></g><g><rect x="80" y="504" width="230" height="54" rx="10" fill="#0f1d33" stroke="#3b82f6" stroke-width="1.6"/><text x="195" y="537" text-anchor="middle">Family history</text></g></g><path d="M195 220 Q 150 250 150 296" fill="none" stroke="#f87171" stroke-width="1.6" stroke-dasharray="4 4" marker-end="url(#agwArrR)"/><text x="30" y="265" font-family="'Inter',sans-serif" font-size="12" fill="#f0a0a0" font-style="italic">correlates</text><path d="M310 327 L 470 380" fill="none" stroke="#60a5fa" stroke-width="1.8" marker-end="url(#agwArrB)"/><path d="M310 395 L 470 410" fill="none" stroke="#60a5fa" stroke-width="1.8" marker-end="url(#agwArrB)"/><path d="M310 463 L 470 430" fill="none" stroke="#60a5fa" stroke-width="1.8" marker-end="url(#agwArrB)"/><path d="M310 531 L 470 460" fill="none" stroke="#60a5fa" stroke-width="1.8" marker-end="url(#agwArrB)"/><rect x="480" y="350" width="200" height="120" rx="14" fill="#10233f" stroke="#a78bfa" stroke-width="2"/><text x="580" y="405" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="20" fill="#c4b5fd" font-weight="700">COMPAS</text><text x="580" y="430" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#9fb4cf">trained on biased</text><text x="580" y="450" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#9fb4cf">historical data</text><path d="M680 410 L 760 410" fill="none" stroke="#a78bfa" stroke-width="2" marker-end="url(#agwArrB)"/><rect x="770" y="350" width="350" height="120" rx="14" fill="#1a1424" stroke="#f87171" stroke-width="2"/><text x="945" y="392" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="18" fill="#ffffff" font-weight="700">Risk score, skewed</text><text x="945" y="424" text-anchor="middle" font-family="'Inter',sans-serif" font-size="14" fill="#f0a0a0">Among those who did NOT reoffend:</text><text x="945" y="448" text-anchor="middle" font-family="'Inter',sans-serif" font-size="14" fill="#f0a0a0">~45% of Black vs ~23% of white</text><text x="945" y="466" text-anchor="middle" font-family="'Inter',sans-serif" font-size="14" fill="#f0a0a0">defendants flagged "high risk"</text></svg></div></div><p class="agw-cap">The model never sees race — but it sees features that stand in for it. Bias laundered through "neutral" inputs is still bias.</p><p>And the fairness argument has a genuinely hard core. The vendor (Northpointe) responded that the tool was<em>calibrated</em> — a given score meant the same probability of reoffending regardless of race — which was true. The catch is mathematical: when the base rates differ between groups, you<strong>cannot</strong> have equal calibration<em>and</em> equal false-positive rates at the same time. The two sides were optimising for different definitions of &ldquo;fair,&rdquo; and both were partly right. That is the lesson: &ldquo;fair&rdquo; is not one thing, and choosing which fairness to enforce is an ethical decision you can&rsquo;t dodge with more math.</p><p>The deeper failures were organisational. COMPAS was proprietary — a black box defendants couldn&rsquo;t inspect or contest — and it was deployed into life-altering decisions without independent audits for subgroup fairness.</p><blockquote><p><strong>Source:</strong> ProPublica,<a href="https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing">&ldquo;Machine Bias&rdquo;</a> (Angwin et al., 2016).</p></blockquote><h3 id="case-b--air-canadas-chatbot-a-confident-costly-wrong-answer">Case B — Air Canada&rsquo;s chatbot: a confident, costly wrong answer</h3><p>The second case is more recent and, honestly, more relatable. In late 2022, Jake Moffatt used Air Canada&rsquo;s website chatbot after his grandmother died, to check the airline&rsquo;s bereavement-fare policy. The bot told him, confidently, that he could book now and claim the bereavement discount retroactively within 90 days. That was simply false — Air Canada&rsquo;s real policy didn&rsquo;t allow retroactive claims, and the bot even contradicted the airline&rsquo;s own linked policy page.</p><p>When Moffatt tried to claim the refund the bot had promised, Air Canada refused — and then argued in tribunal that it shouldn&rsquo;t be liable because the chatbot was &ldquo;a separate legal entity responsible for its own actions.&rdquo; The tribunal rejected that flatly: a company is responsible for everything on its website, whether it comes from a static page or a bot. Moffatt was awarded damages.</p><div class="agw-fig"><div class="agw-wrap"><svg viewBox="0 0 1200 440" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="A left-to-right flow: a customer asks the chatbot, the bot invents an ungrounded answer, the customer relies on it, the claim is refused, the company blames the bot, and the tribunal holds the company liable."><defs><marker id="agwArrW" markerWidth="10" markerHeight="10" refX="7" refY="5" orient="auto"><path d="M0,0 L10,5 L0,10 Z" fill="#9fb4cf"/></marker></defs><rect width="1200" height="440" fill="#0a1424"/><text x="80" y="58" font-family="'Space Grotesk','Inter',sans-serif" font-size="15" fill="#f59e0b" letter-spacing="3" font-weight="700">CASE B · AIR CANADA</text><text x="80" y="98" font-family="'Space Grotesk','Inter',sans-serif" font-size="32" fill="#ffffff" font-weight="700" letter-spacing="-.5">"Not our bot's fault" — and why that failed</text><g font-family="'Inter',sans-serif"><rect x="60" y="150" width="180" height="120" rx="12" fill="#0f1d33" stroke="#60a5fa" stroke-width="1.8"/><text x="150" y="196" text-anchor="middle" font-size="15" fill="#ffffff" font-weight="700">Customer asks</text><text x="150" y="222" text-anchor="middle" font-size="13" fill="#9fb4cf">bereavement-fare</text><text x="150" y="240" text-anchor="middle" font-size="13" fill="#9fb4cf">policy</text><rect x="280" y="150" width="180" height="120" rx="12" fill="#1a1424" stroke="#f87171" stroke-width="1.8"/><text x="370" y="190" text-anchor="middle" font-size="15" fill="#f87171" font-weight="700">Bot invents</text><text x="370" y="214" text-anchor="middle" font-size="13" fill="#f0a0a0">"claim it back</text><text x="370" y="232" text-anchor="middle" font-size="13" fill="#f0a0a0">within 90 days"</text><text x="370" y="254" text-anchor="middle" font-size="12" fill="#f0a0a0" font-style="italic">ungrounded ✕</text><rect x="500" y="150" width="180" height="120" rx="12" fill="#0f1d33" stroke="#60a5fa" stroke-width="1.8"/><text x="590" y="196" text-anchor="middle" font-size="15" fill="#ffffff" font-weight="700">Customer relies</text><text x="590" y="222" text-anchor="middle" font-size="13" fill="#9fb4cf">books the flight,</text><text x="590" y="240" text-anchor="middle" font-size="13" fill="#9fb4cf">expects refund</text><rect x="720" y="150" width="180" height="120" rx="12" fill="#1a1424" stroke="#f87171" stroke-width="1.8"/><text x="810" y="190" text-anchor="middle" font-size="15" fill="#f87171" font-weight="700">Claim refused</text><text x="810" y="214" text-anchor="middle" font-size="13" fill="#f0a0a0">"the bot is a</text><text x="810" y="232" text-anchor="middle" font-size="13" fill="#f0a0a0">separate entity"</text><text x="810" y="254" text-anchor="middle" font-size="12" fill="#f0a0a0" font-style="italic">blame-shift</text><rect x="940" y="150" width="200" height="120" rx="12" fill="#10231a" stroke="#34d399" stroke-width="2"/><text x="1040" y="190" text-anchor="middle" font-size="15" fill="#34d399" font-weight="700">Tribunal: liable</text><text x="1040" y="214" text-anchor="middle" font-size="13" fill="#a8e6c8">a company owns</text><text x="1040" y="232" text-anchor="middle" font-size="13" fill="#a8e6c8">everything on its</text><text x="1040" y="250" text-anchor="middle" font-size="13" fill="#a8e6c8">site. Damages paid.</text><line x1="240" y1="210" x2="276" y2="210" stroke="#9fb4cf" stroke-width="1.8" marker-end="url(#agwArrW)"/><line x1="460" y1="210" x2="496" y2="210" stroke="#9fb4cf" stroke-width="1.8" marker-end="url(#agwArrW)"/><line x1="680" y1="210" x2="716" y2="210" stroke="#9fb4cf" stroke-width="1.8" marker-end="url(#agwArrW)"/><line x1="900" y1="210" x2="936" y2="210" stroke="#9fb4cf" stroke-width="1.8" marker-end="url(#agwArrW)"/></g><text x="80" y="340" font-family="'Inter',sans-serif" font-size="16" fill="#dce7f5" font-weight="700">The failure isn't the CA$800.</text><text x="80" y="366" font-family="'Inter',sans-serif" font-size="15" fill="#9fb4cf">It's the instinct to treat the AI as a third party you can blame — exactly the move responsible-AI</text><text x="80" y="388" font-family="'Inter',sans-serif" font-size="15" fill="#9fb4cf">governance exists to prevent. Deploy-and-forget, on a high-stakes question, with no owner.</text></svg></div></div><p class="agw-cap">A single ungrounded output, deployed in front of customers with no monitoring and no clear owner — and an accountability dodge the tribunal refused to accept.</p><p>What makes this a great teaching case isn&rsquo;t the money (about CA$800). It&rsquo;s the<strong>accountability</strong> move. The instinct to treat the AI as a third party you can blame is exactly the failure mode responsible-AI governance exists to prevent. Technically, the system generated an unverified answer that wasn&rsquo;t grounded in the authoritative policy. Organisationally, it was put in front of customers on high-stakes questions with no guardrails, no monitoring, and no clear owner — a &ldquo;deploy and forget&rdquo; posture.</p><blockquote><p><strong>Source:</strong><em>Moffatt v. Air Canada</em>,<a href="https://www.canlii.org/en/bc/bccrt/doc/2024/2024bccrt149/2024bccrt149.html">2024 BCCRT 149</a>.</p></blockquote><h3 id="the-pattern-across-both">The pattern across both</h3><p>COMPAS fails<em>quietly and systematically</em> through data; Air Canada fails<em>loudly and individually</em> through a single bad output. But the root causes rhyme: a system trusted beyond what it was validated for, no meaningful oversight, and unclear accountability when it broke. Bias and hallucination look different on the surface and share the same governance gap underneath.</p><div class="agw-fig"><div class="agw-wrap"><svg viewBox="0 0 1200 560" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Two failure modes contrasted — COMPAS fails quietly and systematically through data, Air Canada fails loudly and individually through one output — converging on the same three shared root causes."><rect width="1200" height="560" fill="#0a1424"/><text x="600" y="58" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="15" fill="#f59e0b" letter-spacing="3" font-weight="700">THE PATTERN ACROSS BOTH</text><text x="600" y="98" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="30" fill="#ffffff" font-weight="700" letter-spacing="-.5">Different surface, same gap underneath</text><rect x="70" y="140" width="480" height="210" rx="16" fill="#0f1d33" stroke="#a78bfa" stroke-width="2"/><text x="100" y="180" font-family="'Space Grotesk','Inter',sans-serif" font-size="20" fill="#c4b5fd" font-weight="700">COMPAS</text><text x="100" y="206" font-family="'Inter',sans-serif" font-size="14" fill="#9fb4cf">Fails QUIETLY · SYSTEMATICALLY</text><text x="100" y="246" font-family="'Inter',sans-serif" font-size="15" fill="#dce7f5">• Bias baked into the training data</text><text x="100" y="276" font-family="'Inter',sans-serif" font-size="15" fill="#dce7f5">• Harms a whole group, invisibly</text><text x="100" y="306" font-family="'Inter',sans-serif" font-size="15" fill="#dce7f5">• No subgroup fairness audit</text><text x="100" y="336" font-family="'Inter',sans-serif" font-size="15" fill="#dce7f5">• Black-box, can't be contested</text><rect x="650" y="140" width="480" height="210" rx="16" fill="#0f1d33" stroke="#f87171" stroke-width="2"/><text x="680" y="180" font-family="'Space Grotesk','Inter',sans-serif" font-size="20" fill="#f87171" font-weight="700">Air Canada chatbot</text><text x="680" y="206" font-family="'Inter',sans-serif" font-size="14" fill="#9fb4cf">Fails LOUDLY · INDIVIDUALLY</text><text x="680" y="246" font-family="'Inter',sans-serif" font-size="15" fill="#dce7f5">• One confident, ungrounded answer</text><text x="680" y="276" font-family="'Inter',sans-serif" font-size="15" fill="#dce7f5">• Harms one person, visibly</text><text x="680" y="306" font-family="'Inter',sans-serif" font-size="15" fill="#dce7f5">• No grounding in authoritative policy</text><text x="680" y="336" font-family="'Inter',sans-serif" font-size="15" fill="#dce7f5">• Deploy-and-forget, no monitoring</text><path d="M310 350 L 480 410" fill="none" stroke="#7e95b5" stroke-width="1.6"/><path d="M890 350 L 720 410" fill="none" stroke="#7e95b5" stroke-width="1.6"/><rect x="300" y="416" width="600" height="110" rx="16" fill="#1a1606" stroke="#f59e0b" stroke-width="2"/><text x="600" y="450" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#f59e0b" letter-spacing="2" font-weight="700">SAME ROOT CAUSES</text><text x="600" y="480" text-anchor="middle" font-family="'Inter',sans-serif" font-size="16" fill="#ffe7b0" font-weight="600">Trusted beyond validation · No meaningful oversight</text><text x="600" y="506" text-anchor="middle" font-family="'Inter',sans-serif" font-size="16" fill="#ffe7b0" font-weight="600">Unclear accountability when it broke</text></svg></div></div><p class="agw-cap">Bias and hallucination are different symptoms of the same disease: a governance gap, not just a model bug.</p><h2 id="part-2--switching-seats-designing-the-guardrails">Part 2 — Switching seats: designing the guardrails</h2><p>Critiquing failures is the easy half. The harder, more honest half is designing an agent that<em>wouldn&rsquo;t</em> fail the same way. I framed both designs around the same three safeguard categories —<strong>Data Privacy, Content Safety, Operational Oversight</strong> — because that structure forces you to cover the three places agents usually go wrong: the data going in, the content coming out, and the humans watching over the whole thing.</p><div class="agw-fig"><div class="agw-wrap"><svg viewBox="0 0 1200 560" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="A reusable safeguard skeleton: data privacy controls the data going into the agent, content safety controls the content coming out, and operational oversight wraps the whole thing with human review, logging, monitoring, and a kill-switch."><defs><marker id="agwArrG" markerWidth="10" markerHeight="10" refX="7" refY="5" orient="auto"><path d="M0,0 L10,5 L0,10 Z" fill="#34d399"/></marker><marker id="agwArrBl" markerWidth="10" markerHeight="10" refX="7" refY="5" orient="auto"><path d="M0,0 L10,5 L0,10 Z" fill="#60a5fa"/></marker></defs><rect width="1200" height="560" fill="#0a1424"/><text x="80" y="58" font-family="'Space Grotesk','Inter',sans-serif" font-size="15" fill="#f59e0b" letter-spacing="3" font-weight="700">THE PORTABLE SKELETON</text><text x="80" y="98" font-family="'Space Grotesk','Inter',sans-serif" font-size="30" fill="#ffffff" font-weight="700" letter-spacing="-.5">Three places an agent goes wrong — guard all three</text><rect x="60" y="140" width="1080" height="380" rx="20" fill="none" stroke="#a78bfa" stroke-width="2" stroke-dasharray="8 6"/><text x="84" y="170" font-family="'Space Grotesk','Inter',sans-serif" font-size="16" fill="#c4b5fd" font-weight="700">3 · OPERATIONAL OVERSIGHT</text><text x="84" y="194" font-family="'Inter',sans-serif" font-size="14" fill="#b9a8e8">human review gate · logging · continuous monitoring · escalation path · kill-switch</text><rect x="110" y="250" width="240" height="200" rx="14" fill="#0f1d33" stroke="#60a5fa" stroke-width="2"/><text x="230" y="288" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="17" fill="#60a5fa" font-weight="700">1 · DATA PRIVACY</text><text x="230" y="312" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#9fb4cf">the data going IN</text><text x="130" y="348" font-family="'Inter',sans-serif" font-size="14" fill="#dce7f5">• minimise &amp; mask identifiers</text><text x="130" y="376" font-family="'Inter',sans-serif" font-size="14" fill="#dce7f5">• role-based access</text><text x="130" y="404" font-family="'Inter',sans-serif" font-size="14" fill="#dce7f5">• pull only fields a query needs</text><text x="130" y="432" font-family="'Inter',sans-serif" font-size="14" fill="#dce7f5">• log every access</text><rect x="480" y="280" width="240" height="140" rx="16" fill="#10233f" stroke="#ffffff" stroke-width="2"/><text x="600" y="338" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="22" fill="#ffffff" font-weight="700">THE AGENT</text><text x="600" y="366" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#9fb4cf">assists — never decides</text><text x="600" y="386" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#9fb4cf">on its own</text><rect x="850" y="250" width="240" height="200" rx="14" fill="#0f1d33" stroke="#34d399" stroke-width="2"/><text x="970" y="288" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="17" fill="#34d399" font-weight="700">2 · CONTENT SAFETY</text><text x="970" y="312" text-anchor="middle" font-family="'Inter',sans-serif" font-size="13" fill="#9fb4cf">the content coming OUT</text><text x="870" y="348" font-family="'Inter',sans-serif" font-size="14" fill="#dce7f5">• ground answers in evidence</text><text x="870" y="376" font-family="'Inter',sans-serif" font-size="14" fill="#dce7f5">• surface uncertainty</text><text x="870" y="404" font-family="'Inter',sans-serif" font-size="14" fill="#dce7f5">• refuse high-risk requests</text><text x="870" y="432" font-family="'Inter',sans-serif" font-size="14" fill="#dce7f5">• escalate instead of guessing</text><line x1="350" y1="350" x2="476" y2="350" stroke="#60a5fa" stroke-width="2.2" marker-end="url(#agwArrBl)"/><line x1="720" y1="350" x2="846" y2="350" stroke="#34d399" stroke-width="2.2" marker-end="url(#agwArrG)"/></svg></div></div><p class="agw-cap">Same skeleton every time: lock down the inputs, constrain the outputs, and wrap a human-run oversight layer around the whole thing. What changes is the detail inside each box.</p><h3 id="domain-1--healthcare-a-clinical-decision-support-chatbot">Domain 1 — Healthcare: a clinical decision-support chatbot</h3><p><strong>The use case:</strong> an agent inside a hospital&rsquo;s records system that helps<em>clinicians</em> (not patients) by summarising a patient&rsquo;s history and suggesting possible differential diagnoses and relevant guidelines. Crucially, it only<em>suggests</em>. It never diagnoses, prescribes, or talks to patients on its own. Defining what it<em>can&rsquo;t</em> do is half the safety work.</p><ul><li><strong>Data Privacy:</strong> patient data is about as sensitive as it gets, so the agent runs in a HIPAA-compliant environment, masks identifiers before processing, pulls only the fields a query needs, and uses role-based access so a clinician can only see their own patients. Every access is logged.</li><li><strong>Content Safety:</strong> the real danger is a confident, wrong clinical suggestion. So the agent is constrained to cite evidence-based guidelines, must surface its uncertainty, refuses high-risk questions like paediatric dosing (escalating to a pharmacist instead), and labels every output &ldquo;decision support, not a diagnosis.&rdquo;</li><li><strong>Operational Oversight:</strong> a licensed clinician reviews and approves anything before it touches care, every recommendation is logged for traceability, accuracy is monitored continuously, and there&rsquo;s a kill-switch to pull the tool if error rates spike.</li></ul><p>The thread running through it: the agent assists, the clinician decides. Autonomy is deliberately capped below the point where a wrong answer could act on its own.</p><h3 id="domain-2--education-an-ai-teaching-assistant">Domain 2 — Education: an AI teaching assistant</h3><p><strong>The use case:</strong> an agent in a college&rsquo;s learning platform that helps<em>students</em> with course material — explaining concepts, unpacking feedback, pointing to readings, generating practice problems. It supports learning; it does not grade official work or write the assignments students submit.</p><ul><li><strong>Data Privacy:</strong> student records are FERPA-protected, so the same discipline applies — minimise data, mask identifiers, role-based access so a student sees only their own data, audit logs, and no quietly training external models on student conversations.</li><li><strong>Content Safety:</strong> here &ldquo;unsafe&rdquo; has a twist — the danger isn&rsquo;t just false info, it&rsquo;s<em>doing the work for the student</em>. So the agent scaffolds and hints rather than handing over finished answers on graded work, refuses to write submittable assignments, cites course materials instead of inventing them, and routes sensitive disclosures (self-harm, harassment) to human support.</li><li><strong>Operational Oversight:</strong> instructors configure and review how it&rsquo;s used, interactions are logged for academic-integrity checks, accuracy and flagged conversations are monitored, and there&rsquo;s an escalation path to a human plus a disable switch.</li></ul><p>Notice how the<em>same</em> three-category skeleton produces different specifics once you take the domain&rsquo;s real risks seriously. In healthcare the nightmare is a wrong diagnosis; in education it&rsquo;s eroding academic integrity. The structure is portable; the judgement is not.</p><div class="agw-fig"><div class="agw-wrap"><svg viewBox="0 0 1200 620" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="A table mapping the three safeguard categories onto two domains: healthcare clinical decision support and an education teaching assistant, showing how the same skeleton produces different specifics."><rect width="1200" height="620" fill="#0a1424"/><text x="80" y="56" font-family="'Space Grotesk','Inter',sans-serif" font-size="15" fill="#f59e0b" letter-spacing="3" font-weight="700">SAME SKELETON · TWO DOMAINS</text><text x="80" y="96" font-family="'Space Grotesk','Inter',sans-serif" font-size="28" fill="#ffffff" font-weight="700" letter-spacing="-.5">Portable structure, domain-specific judgement</text><rect x="360" y="120" width="370" height="56" rx="10" fill="#10231a" stroke="#34d399" stroke-width="1.6"/><text x="545" y="148" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="17" fill="#34d399" font-weight="700">Healthcare</text><text x="545" y="168" text-anchor="middle" font-family="'Inter',sans-serif" font-size="12" fill="#a8e6c8">clinical decision support</text><rect x="745" y="120" width="370" height="56" rx="10" fill="#16112a" stroke="#a78bfa" stroke-width="1.6"/><text x="930" y="148" text-anchor="middle" font-family="'Space Grotesk','Inter',sans-serif" font-size="17" fill="#c4b5fd" font-weight="700">Education</text><text x="930" y="168" text-anchor="middle" font-family="'Inter',sans-serif" font-size="12" fill="#cfc2f0">AI teaching assistant</text><g font-family="'Inter',sans-serif"><rect x="70" y="190" width="270" height="120" rx="10" fill="#0f1d33" stroke="#60a5fa" stroke-width="1.6"/><text x="90" y="226" font-size="16" fill="#60a5fa" font-weight="700">Data Privacy</text><text x="90" y="252" font-size="13" fill="#9fb4cf">the data going in</text><rect x="360" y="190" width="370" height="120" rx="10" fill="#0c1626" stroke="#21405f" stroke-width="1.2"/><text x="378" y="222" font-size="14" fill="#dce7f5">HIPAA env · mask IDs ·</text><text x="378" y="246" font-size="14" fill="#dce7f5">role-based access (own</text><text x="378" y="270" font-size="14" fill="#dce7f5">patients) · log every</text><text x="378" y="294" font-size="14" fill="#dce7f5">access</text><rect x="745" y="190" width="370" height="120" rx="10" fill="#0c1626" stroke="#21405f" stroke-width="1.2"/><text x="763" y="222" font-size="14" fill="#dce7f5">FERPA · mask IDs · own</text><text x="763" y="246" font-size="14" fill="#dce7f5">data only · audit logs ·</text><text x="763" y="270" font-size="14" fill="#dce7f5">no training external</text><text x="763" y="294" font-size="14" fill="#dce7f5">models on chats</text><rect x="70" y="324" width="270" height="120" rx="10" fill="#0f1d33" stroke="#34d399" stroke-width="1.6"/><text x="90" y="360" font-size="16" fill="#34d399" font-weight="700">Content Safety</text><text x="90" y="386" font-size="13" fill="#9fb4cf">the content coming out</text><rect x="360" y="324" width="370" height="120" rx="10" fill="#0c1626" stroke="#21405f" stroke-width="1.2"/><text x="378" y="356" font-size="14" fill="#dce7f5">cite guidelines · show</text><text x="378" y="380" font-size="14" fill="#dce7f5">uncertainty · refuse</text><text x="378" y="404" font-size="14" fill="#dce7f5">paediatric dosing · "not</text><text x="378" y="428" font-size="14" fill="#dce7f5">a diagnosis"</text><rect x="745" y="324" width="370" height="120" rx="10" fill="#0c1626" stroke="#21405f" stroke-width="1.2"/><text x="763" y="356" font-size="14" fill="#dce7f5">scaffold &amp; hint · won't</text><text x="763" y="380" font-size="14" fill="#dce7f5">write submittable work ·</text><text x="763" y="404" font-size="14" fill="#dce7f5">cite course material ·</text><text x="763" y="428" font-size="14" fill="#dce7f5">route self-harm to humans</text><rect x="70" y="458" width="270" height="120" rx="10" fill="#0f1d33" stroke="#a78bfa" stroke-width="1.6"/><text x="90" y="494" font-size="16" fill="#c4b5fd" font-weight="700">Oversight</text><text x="90" y="520" font-size="13" fill="#9fb4cf">the humans watching</text><rect x="360" y="458" width="370" height="120" rx="10" fill="#0c1626" stroke="#21405f" stroke-width="1.2"/><text x="378" y="490" font-size="14" fill="#dce7f5">clinician approves before</text><text x="378" y="514" font-size="14" fill="#dce7f5">care · log all recs ·</text><text x="378" y="538" font-size="14" fill="#dce7f5">monitor accuracy ·</text><text x="378" y="562" font-size="14" fill="#dce7f5">kill-switch on error spike</text><rect x="745" y="458" width="370" height="120" rx="10" fill="#0c1626" stroke="#21405f" stroke-width="1.2"/><text x="763" y="490" font-size="14" fill="#dce7f5">instructors configure ·</text><text x="763" y="514" font-size="14" fill="#dce7f5">log for integrity checks ·</text><text x="763" y="538" font-size="14" fill="#dce7f5">monitor flags · escalate</text><text x="763" y="562" font-size="14" fill="#dce7f5">to human · disable switch</text></g></svg></div></div><p class="agw-cap">The three rows never change. The cells do — because in healthcare the nightmare is a wrong diagnosis, and in education it's a student who never actually learned.</p><h2 id="what-im-taking-away">What I&rsquo;m taking away</h2><p>Three things stuck with me:</p><ol><li><strong>The failure is rarely the model alone.</strong> In every case, the technical fault was amplified by an organisational gap — no audit, no human gate, no clear owner. Governance is not paperwork wrapped around the AI; it<em>is</em> the safety system.</li><li><strong>&ldquo;Fair&rdquo; and &ldquo;safe&rdquo; require you to choose.</strong> COMPAS proved you sometimes can&rsquo;t satisfy every definition of fairness at once. Pretending otherwise is how you end up shipping the bias.</li><li><strong>Good safeguards are boring on purpose.</strong> Logging, escalation, human review, kill-switches, scope limits. None of it is exciting. All of it is what stands between a useful agent and a headline.</li></ol><p>The capability race will keep accelerating. The quieter discipline — deciding where autonomy ends and accountability begins — is the part that decides whether any of it can be trusted.</p>
]]></content:encoded><media:content url="https://curiousbit.netlify.app/images/agents-go-wrong/hero.jpg" medium="image"><media:title type="plain">Agents</media:title></media:content><category>artificial-intelligence</category><category>ai-safety</category><category>governance</category><category>ethics</category><category>agents</category><category>Knowledge Base</category></item></channel></rss>