Skip to content
⚡ Interactive #LLMs #Memory #Inference #GPU

Memory Management in LLMs

A structured hub on how large language models use, store, and optimize memory — from the bytes that hold model weights on a GPU to how an agent remembers across sessions. Pick a topic to dive in.

By Ajay Walia · Jun 14, 2026 · 1 min read

Share: LinkedIn

Everything about how large language models use, store, and optimize memory — from the bytes that hold model weights on a GPU to how an agent remembers a conversation across sessions. Each topic is tagged by when the memory is consumed. Pick a card to open the article.

Static — set before you run Runtime — scales with workload Training — only during training Overview App-level memory

System & Runtime Memory

How the model physically uses hardware

Agent & Long-Term Memory

How the model "remembers" across turns & sessions
Ajay Walia

About the Author

Ajay Walia

AI {IT Architect} focusing on local-first multi-agent AI engineering, zero-data-egress systems. Ideator, Creator and Executor on Curious Bit.

Don't stop now

Keep Reading