⚡ Interactive #LLMs #Memory #Inference #GPU

Memory Management in LLMs

A structured hub on how large language models use, store, and optimize memory — from the bytes that hold model weights on a GPU to how an agent remembers across sessions. Pick a topic to dive in.

By Ajay Walia · Jun 14, 2026 · 1 min read

Share: LinkedIn

Everything about how large language models use, store, and optimize memory — from the bytes that hold model weights on a GPU to how an agent remembers a conversation across sessions. Each topic is tagged by when the memory is consumed. Pick a card to open the article.

Static — set before you run Runtime — scales with workload Training — only during training Overview App-level memory

System & Runtime Memory

How the model physically uses hardware

Agent & Long-Term Memory

How the model "remembers" across turns & sessions

About the Author

Ajay Walia

AI {IT Architect} focusing on local-first multi-agent AI engineering, zero-data-egress systems. Ideator, Creator and Executor on Curious Bit.

LinkedIn GitHub 📧 Subscribe

Don't stop now

Keep Reading

Glowing neural network floating in deep space — LLMs are probability engines

⚡ Interactive 📝 Quiz artificial-intelligence llm

LLMs Are Probability Engines, Not "Thinkers"

What ChatGPT and Claude actually are under the hood — a plain-English explainer of next-token prediction, softmax, attention, and why hallucinations are inevitable. Beginner to intermediate, with interactive animations.

Ajay Walia · Jun 7, 2026

Three RAG pipeline diagrams arranged side by side with a memory-efficiency overlay

📝 Quiz artificial-intelligence rag

RAG, Graph RAG, Agentic RAG — and How to Make Any of Them 32× Memory Efficient

A visual breakdown of three RAG architectures — when each one wins, where it breaks down, and how binary quantization can shrink the vector index by 32× without changing the architecture you picked.

Ajay Walia · May 28, 2026

Contra-style executive briefing hero — six-minute read, FTEs to agentic workflows, with soldiers carrying clipboards instead of weapons

📝 Quiz digital-workplace

From FTEs to Agentic Workflows — Reshaping Infrastructure Outsourcing

A six-minute executive briefing on the architectural and commercial shift underway in Indian infrastructure outsourcing. Where the industry sits on the automation maturity curve, why the incentive conflict is harder than the technology, and what one fictional 80,000-endpoint bank account looks like over a 24-month rollout. Companion to the long-form analysis.

Ajay Walia · May 20, 2026