Recent new LLM launches (GPT-5, Opus, Sonnet 4, Gemini, Grok 3) make me feel like I’m watching the iPhone era from 5 to 8. There’s improvement, sure—but we’ve entered the “incremental” phase. The awe is gone. And while models keep getting better, the real unlock for productivity no longer lies in the model itself.
LLMs are good enough. With the right setup, they can already do much more than most people realize.
This post nails it: the next breakthroughs won't come from squeezing a few more benchmark points out of foundation models. It'll come from giving them proper context—via memory.
Why Memory Matters
Right now, most AI agents are like brilliant amnesiacs. Each interaction is like Day One at a new job. They can answer your questions or generate something impressive—but they forget everything the moment you’re done. That’s fine for one-off tasks, and it seems like this is the focus area for most of the LLM benchmarks, but completely breaks down for ongoing workflows, long-term projects, or any use case that requires continuity.
The solution is obvious: memory. And not just short-term context windows or vector search hacks. We’re talking long-term memory—structured, persistent, personalized, and evolving over time.
Short-Term vs. Long-Term Memory
Let’s define the layers:
- Short-Term Memory: This is what’s in the context window. Some systems enhance it with scratchpads or ephemeral notes across steps in a chain (e.g. in agents like AutoGPT or OpenDevin).
- Long-Term Memory: This is the real prize. It’s persistent across sessions, curated over time, tied to identity (user/team/app), and structured in a way that retrieval can be precise and useful.
Long-term memory enables:
- Agents that remember what you like and how you work
- Iterative design or coding assistants that don’t forget your style
- Customer support bots that understand past conversations and issues
- Company-wide AI systems that reflect your actual org chart, team docs, and historical decisions
Memory Architecture Is the New Stack
We’re seeing the beginning of the “memory layer” ecosystem:
- General-Purpose Memory Systems: MemGPT, LangGraph, OpenAI’s memory, and MIRIX are examples that try to segment and route memory intelligently.
- Domain-Specific Memory: SaaS tools are baking in memory layers for coding, writing, product management, or customer success.
- Self-Hosted Memory: This is the most exciting for companies. Own your data. Store knowledge in your own DB. Use RAG, structured DB queries, and memory routers to inject the right context when needed. You don’t need OpenAI to do it for you.
The future memory layer will look like a hybrid of Redis, S3, Postgres, and Notion—combined with RAG pipelines and smart memory routing. And like data warehouses unlocked BI, memory layers will unlock truly useful AI.
Why Most SaaS Wrappers Around LLMs Miss the Mark
A lot of startups are trying to “own” the user’s memory by storing knowledge inside their product. But that’s fighting gravity. Most companies already have memory—in their Notion, Confluence, Slack, Jira, etc. The winner won't be the one that hosts the knowledge, but the one that helps route it properly.
Memory isn’t just storage. It’s:
- Encoding (What gets remembered? In what format?)
- Retrieval (How is it fetched at the right time?)
- Forgetting (What ages out, what sticks?)
- Attribution (Who said this? Can I trust it?)
That’s why memory is hard—and that’s why it’s where the next real breakthroughs will happen.
What Comes Next
We won’t get to “AI coworkers” or “10x engineers with AI” just by upgrading the model. We’ll get there by giving the AI continuity. By letting it remember what you did yesterday, last week, last year.
And when we get this right, LLMs won’t just be tools—they’ll become collaborators.