The Minion Who Wanted a Touchscreen
How a MacBook without a touchscreen led to Glide: an invisible touchscreen powered by computer vision.
On this page
Chapter 1 — The Problem
We live in a world of glass that responds to us. Tap an iPhone, it opens. Swipe an iPad, it scrolls.
But when I reach out to my MacBook display? Nothing. Just fingerprints.
The interaction model is broken. We expect a simple flow:
Finger ↓ Screen ↓ Content Moves
But reality on a Mac is:
Finger ↓ Screen ↓ Nothing Happens
Chapter 2 — The Crazy Idea
I didn’t want to buy an external touchscreen, and I certainly didn’t want a clunky piece of hardware taped to my bezel. I just wanted my Mac to understand when my finger was moving across it.
What if we could create an invisible sheet of glass in front of the display?
Chapter 3 — Building an Invisible Touchscreen
To make this work without lag, it couldn’t be a Python script running in a terminal. It had to be a native macOS menu bar app, tapping directly into the Apple Neural Engine. Here is how the v2 architecture flows:
To explain this, let’s meet the engineering team:
Camera Manager
Captures an AVCaptureSession at a strict 30/60 fps with actor isolation.
Vision Pipeline
Uses VNDetectHumanHandPoseRequest to identify the index finger in the frame.
Hand Tracker Applies a centroid-based lock (0.15 threshold) so the system doesn’t jump between hands.
Scroll Engine A GestureStateMachine (Idle → Dwelling → TouchActive → Releasing) that posts FPS-normalised scroll events directly into the macOS HID system.
Chapter 4 — Why Gesture Control Is Wrong
The first instinct when building camera control is to use gestures.
If I have to learn sign language to read an article, the tool failed. I didn’t want gesture recognition. I wanted direct manipulation.
With Glide:
- Point finger.
- Touch page.
- Move page.
Chapter 5 — The UX Rabbit Hole
If you’ve ever used a bad gesture system, you know the feeling.
Imagine hovering your hand in mid-air. In a typical gesture system, you enter a “touch zone” and then are forced to wait. A mandatory 200-millisecond delay kicks in just to confirm your intent before the system finally registers the action. That tiny fraction of a second feels sluggish, unnatural, and deeply frustrating.
Now imagine a true physical touchscreen. The moment your finger touches the glass, the response is instant. No waiting. No unnatural pausing. The digital content tracks perfectly and immediately with your physical movement.
The difference between a gimmick and a tool is latency.
Chapter 6 — The Touch Plane
To make it feel like a touchscreen, we had to invent a virtual screen floating exactly 10 inches in front of the actual screen.
Imagine a strict, invisible boundary hovering parallel to your MacBook display. Until your finger crosses that exact depth threshold, the system entirely ignores your movements. But the moment your fingertip pierces that invisible layer, the interface wakes up—locking onto your finger’s precise coordinates and translating them into immediate, pixel-perfect scrolling.
Chapter 7 — The Tech Stack
To run a continuous computer vision pipeline without setting a MacBook on fire, the tech stack had to be heavily optimized.
Why Swift?
Using Swift gave us direct access to native macOS APIs and the Vision Framework. It meant better latency, memory safety, and lower power consumption since the Neural Engine does the heavy lifting.
Tech Stack Callout:
- Language: Swift 5.9+ (Strict concurrency)
- UI: SwiftUI
- Computer Vision: Vision Framework
- Event Injection: Quartz Event Services
- Architecture: Menu Bar App
- Platform: Apple Silicon (M-series)
Chapter 8 — The Real Challenge
The hardest problem wasn’t the computer vision or the math.
The hardest problem was comfort. Humans aren’t built to hold their arms out straight for eight hours a day.
Our Test Matrix (from the v1.0 Test Plan) reflected this. We had to pass:
Chapter 9 — What Success Looks Like
Final Section
The journey doesn’t stop at scrolling.
Today ↓ Scrolling
Tomorrow ↓ Zoom ↓ Click ↓ Window Management ↓ Presentations
At Aera, we are building a camera-native interaction layer for macOS. Glide is just the beginning.
An Honest Postscript — Into Cold Storage
I’ll be straight about where this actually ended: I left Glide midway. It’s in cold storage, not shipped.
A few things stacked up. Token limits and the sheer time Antigravity took to grind through each problem made every iteration expensive. And the LLM never managed the one breakthrough that mattered — the scrolling logic. We could detect the hand, lock onto the finger, calibrate the touch plane, and light up the debug HUD, but turning that into scrolling that felt genuinely like a touchscreen stayed just out of reach. At some point I chose to stop rather than keep forcing it.
But I don’t count it as wasted. I set out to treat this like a proper software-engineering project rather than a weekend hack — a real architecture, a menu-bar app instead of a terminal script, and detailed documentation that a serious project would keep: a maintained CHANGELOG, a test plan and test matrix, an architecture and a gesture spec. That discipline is the thing I’m taking with me. Even shelved, the project is fully legible — anyone (including future me) can pick it up and know exactly where it stands and why. That’s the real learning: good documentation is what lets a project survive being put down.
See it on the bench — the screenshots
Seven captioned frames from the build: the Xcode project and its CHANGELOG, the AI agent wrestling with the scroll logic, the full five-step onboarding flow, and the live debug HUD.
Open the screenshot gallery →
About the Author
Ajay Walia
AI {IT Architect} focusing on local-first multi-agent AI engineering, zero-data-egress systems. Ideator, Creator and Executor on Curious Bit.
Keep Reading

When AI Agents Go Wrong — and How to Engineer Ones That Don't
Two real AI failures, two domain safeguard designs, and the responsible-AI thinking that connects them. My write-up from a mini project on agent risk, ethics, and governance.

Aether, Rethought — The Shape Was Wrong All Along
Our first build mirrored the org chart. It was the wrong shape. Here's how five recognised agentic design patterns, scored against the same criteria, led to a hybrid recommendation — and what changes in v3.

I Built My Own RSS Reader in an Afternoon — With AI Doing the Typing
How I built LumenAI — a local-first, native macOS RSS reader with full-text search, offline reading, and pluggable AI summaries — from an empty folder to a signed DMG in about an hour, with Claude writing every line.