<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Ajay Walia</title><link>https://curiousbit.netlify.app/</link><description>Digital workplace, artificial intelligence, cloud, security, automation, and enterprise technology notes by Ajay Walia.</description><language>en-au</language><managingEditor>Ajay Walia</managingEditor><webMaster>Ajay Walia</webMaster><copyright>Copyright 2026 Ajay Walia</copyright><lastBuildDate>Sun, 21 Jun 2026 05:46:10 +0000</lastBuildDate><atom:link href="https://curiousbit.netlify.app/tags/macos/index.xml" rel="self" type="application/rss+xml"/><image><url>https://curiousbit.netlify.app/images/og-default.png</url><title>Ajay Walia</title><link>https://curiousbit.netlify.app/</link></image><item><title>The Minion Who Wanted a Touchscreen</title><link>https://curiousbit.netlify.app/the-minion-who-wanted-a-touchscreen-glide/</link><guid isPermaLink="true">https://curiousbit.netlify.app/the-minion-who-wanted-a-touchscreen-glide/</guid><pubDate>Sat, 06 Jun 2026 00:00:00 +0000</pubDate><dc:creator>Ajay Walia</dc:creator><description>&lt;h2 id="chapter-1--the-problem"&gt;Chapter 1 — The Problem&lt;/h2&gt;
&lt;p&gt;We live in a world of glass that responds to us. Tap an iPhone, it opens. Swipe an iPad, it scrolls.&lt;/p&gt;</description><content:encoded>&lt;![CDATA[<img src="https://curiousbit.netlify.app/images/Minion/hero.jpg" alt="Macos" style="max-width:100%;height:auto;margin-bottom:1.5em;"/><h2 id="chapter-1--the-problem">Chapter 1 — The Problem</h2><p>We live in a world of glass that responds to us. Tap an iPhone, it opens. Swipe an iPad, it scrolls.</p><p>But when I reach out to my MacBook display? Nothing. Just fingerprints.</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/ANIM-001.mp4" type="video/mp4"/></p><p>The interaction model is broken. We expect a simple flow:<strong><code>Finger</code> ↓<code>Screen</code> ↓<code>Content Moves</code></strong></p><p>But reality on a Mac is:<strong><code>Finger</code> ↓<code>Screen</code> ↓<code>Nothing Happens</code></strong></p><h2 id="chapter-2--the-crazy-idea">Chapter 2 — The Crazy Idea</h2><p>I didn&rsquo;t want to buy an external touchscreen, and I certainly didn&rsquo;t want a clunky piece of hardware taped to my bezel. I just wanted my Mac to understand when my finger was moving across it.</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-002.mp4" type="video/mp4"/></p><p>What if we could create an invisible sheet of glass in front of the display?</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-003.mp4" type="video/mp4"/></p><h2 id="chapter-3--building-an-invisible-touchscreen">Chapter 3 — Building an Invisible Touchscreen</h2><p>To make this work without lag, it couldn&rsquo;t be a Python script running in a terminal. It had to be a native macOS menu bar app, tapping directly into the Apple Neural Engine. Here is how the v2 architecture flows:</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/DIAG-002.mp4" type="video/mp4"/></p><p>To explain this, let&rsquo;s meet the engineering team:</p><p><strong>Camera Manager</strong><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-004.mp4" type="video/mp4"/>
Captures an<code>AVCaptureSession</code> at a strict 30/60 fps with actor isolation.</p><p><strong>Vision Pipeline</strong><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-005.mp4" type="video/mp4"/>
Uses<code>VNDetectHumanHandPoseRequest</code> to identify the index finger in the frame.</p><p><strong>Hand Tracker</strong><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-006.mp4" type="video/mp4"/>
Applies a centroid-based lock (0.15 threshold) so the system doesn&rsquo;t jump between hands.</p><p><strong>Scroll Engine</strong><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-007.mp4" type="video/mp4"/>
A GestureStateMachine (Idle → Dwelling → TouchActive → Releasing) that posts FPS-normalised scroll events directly into the macOS HID system.</p><h2 id="chapter-4--why-gesture-control-is-wrong">Chapter 4 — Why Gesture Control Is Wrong</h2><p>The first instinct when building camera control is to use gestures.</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-008.mp4" type="video/mp4"/></p><p>If I have to learn sign language to read an article, the tool failed. I didn&rsquo;t want gesture recognition. I wanted<strong>direct manipulation</strong>.</p><p>With Glide:</p><ol><li>Point finger.</li><li>Touch page.</li><li>Move page.</li></ol><h2 id="chapter-5--the-ux-rabbit-hole">Chapter 5 — The UX Rabbit Hole</h2><p>If you&rsquo;ve ever used a bad gesture system, you know the feeling.</p><p>Imagine hovering your hand in mid-air. In a typical gesture system, you enter a &ldquo;touch zone&rdquo; and then are forced to wait. A mandatory 200-millisecond delay kicks in just to confirm your intent before the system finally registers the action. That tiny fraction of a second feels sluggish, unnatural, and deeply frustrating.</p><p>Now imagine a true physical touchscreen. The moment your finger touches the glass, the response is instant. No waiting. No unnatural pausing. The digital content tracks perfectly and immediately with your physical movement.</p><p>The difference between a gimmick and a tool is latency.</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/DIAG-003.mp4" type="video/mp4"/></p><h2 id="chapter-6--the-touch-plane">Chapter 6 — The Touch Plane</h2><p>To make it feel like a touchscreen, we had to invent a virtual screen floating exactly 10 inches in front of the actual screen.</p><p>Imagine a strict, invisible boundary hovering parallel to your MacBook display. Until your finger crosses that exact depth threshold, the system entirely ignores your movements. But the moment your fingertip pierces that invisible layer, the interface wakes up—locking onto your finger&rsquo;s precise coordinates and translating them into immediate, pixel-perfect scrolling.</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/ANIM-004.mp4" type="video/mp4"/></p><h2 id="chapter-7--the-tech-stack">Chapter 7 — The Tech Stack</h2><p>To run a continuous computer vision pipeline without setting a MacBook on fire, the tech stack had to be heavily optimized.</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-010.mp4" type="video/mp4"/></p><p><strong>Why Swift?</strong></p><p>Using Swift gave us direct access to native macOS APIs and the Vision Framework. It meant better latency, memory safety, and lower power consumption since the Neural Engine does the heavy lifting.</p><p><strong>Tech Stack Callout:</strong></p><ul><li><strong>Language:</strong> Swift 5.9+ (Strict concurrency)</li><li><strong>UI:</strong> SwiftUI</li><li><strong>Computer Vision:</strong> Vision Framework</li><li><strong>Event Injection:</strong> Quartz Event Services</li><li><strong>Architecture:</strong> Menu Bar App</li><li><strong>Platform:</strong> Apple Silicon (M-series)</li></ul><h2 id="chapter-8--the-real-challenge">Chapter 8 — The Real Challenge</h2><p>The hardest problem wasn&rsquo;t the computer vision or the math.</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-011.mp4" type="video/mp4"/></p><p>The hardest problem was comfort. Humans aren&rsquo;t built to hold their arms out straight for eight hours a day.</p><p>Our Test Matrix (from the v1.0 Test Plan) reflected this. We had to pass:</p><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-012.mp4" type="video/mp4"/></p><h2 id="chapter-9--what-success-looks-like">Chapter 9 — What Success Looks Like</h2><p><video autoplay= loop= muted= playsinline= style="width:100%; border-radius:8px; margin:1.5rem 0;"><source src="/images/Minion/IMG-013.mp4" type="video/mp4"/></p><h2 id="final-section">Final Section</h2><p>The journey doesn&rsquo;t stop at scrolling.</p><p><strong>Today</strong> ↓ Scrolling<br><strong>Tomorrow</strong> ↓ Zoom ↓ Click ↓ Window Management ↓ Presentations</p><p>At Aera, we are building a camera-native interaction layer for macOS. Glide is just the beginning.</p><h2 id="an-honest-postscript--into-cold-storage">An Honest Postscript — Into Cold Storage</h2><p>I&rsquo;ll be straight about where this actually ended: I left Glide midway. It&rsquo;s in cold storage, not shipped.</p><p>A few things stacked up. Token limits and the sheer time Antigravity took to grind through each problem made every iteration expensive. And the LLM never managed the one breakthrough that mattered — the scrolling logic. We could detect the hand, lock onto the finger, calibrate the touch plane, and light up the debug HUD, but turning that into scrolling that felt genuinely like a touchscreen stayed just out of reach. At some point I chose to stop rather than keep forcing it.</p><p>But I don&rsquo;t count it as wasted. I set out to treat this like a proper software-engineering project rather than a weekend hack — a real architecture, a menu-bar app instead of a terminal script, and detailed documentation that a serious project would keep: a maintained CHANGELOG, a test plan and test matrix, an architecture and a gesture spec. That discipline is the thing I&rsquo;m taking with me. Even shelved, the project is fully legible — anyone (including future me) can pick it up and know exactly where it stands and why. That&rsquo;s the real learning: good documentation is what lets a project survive being put down.</p><div style="background:linear-gradient(135deg,#10151f,#161d2b);border:1px solid rgba(96,165,250,0.28);border-radius:12px;padding:36px 32px;margin:44px 0;text-align:center;"><h2 style="margin:0 0 12px;color:#fff;">See it on the bench — the screenshots</h2><p style="color:#c7d0df;max-width:620px;margin:0 auto 24px;font-size:1.08rem;line-height:1.7;">Seven captioned frames from the build: the Xcode project and its CHANGELOG, the AI agent wrestling with the scroll logic, the full five-step onboarding flow, and the live debug HUD.</p><a href="/glide-screens.html" target="_blank" rel="noopener" style="display:inline-block;background:#3b82f6;color:#fff;font-weight:700;padding:15px 32px;border-radius:6px;font-size:1.02rem;">Open the screenshot gallery →</a></div>
]]></content:encoded><media:content url="https://curiousbit.netlify.app/images/Minion/hero.jpg" medium="image"><media:title type="plain">Macos</media:title></media:content><category>swift</category><category>computer-vision</category><category>macos</category><category>Projects</category></item></channel></rss>