From intelligence to relevance. From models to memory. From everyone to you.
Why the next product moat isn't bigger models — it's the surface that adapts to each user.
Why the differentiation moves up the stack — into the context layer that wraps every AI client.
Jason Arbon
Founder & CEO, Testers.AI
Previously: Search personalization @ Google · Relevance measurement @ Bing · Founder, Test.ai (Google-funded) · Author, How Google Tests Software
Previously: Built product on Google Search personalization · Relevance & quality @ Bing · Founder, Test.ai (Google-funded) · Author, How Google Tests Software
Previously: Engineered Google Search personalization (Twitter & Google+ social-signal ranking) · Bing relevance evaluation at scale · Founded Test.ai for AI test infrastructure (Google-funded) · Author, How Google Tests Software
Google had the researchers to hand-tune equations. Microsoft used AI to leapfrog.
Human judgment becomes machine intelligence. Millions of labels, one ranking model.
Climb the relevance curve with better models, more compute, smarter ranking — and you still asymptote at the same ceiling: the best possible answer for everyone.
You cannot engineer your way past 80 with a better engine. The last 20 points belong to the person.
A plant, a president, and a rock band walk into a results page. Whether this ranking is right depends entirely on who's looking at it.
One ranking cannot be right for both of them. Relevance is personal — that's the whole talk.
The algorithm was easy. The hard part was proving each cohort's ranker was actually better — with paid human raters, at statistical confidence.
The product was simple. The per-cohort eval — raters × queries × cohorts × statistical floor — wasn't.
The ranker change was trivial. Per-cohort offline eval needed
N_raters × M_queries × hours × $rate × C_cohorts,
each cohort hitting its own significance threshold.
to measure it. For the basic cohorts.
Personalization didn't stall on compute. It stalled on measurement economics. That's the story of the next 45 minutes — and why it's about to flip.
Personalization didn't stall on compute. It stalled on measurement. AI just collapsed the cost of evaluation.
Personalization stalled on rater throughput, not ranker quality. LLM-as-judge + synthetic users collapsed the eval cost curve.
Forget AI for a second. Just take the basic Madison Avenue demographics — the categories the ad industry has used to slice audiences for 60 years. Watch how fast it blows up.
distinct audience cohorts.
And that's before psychographics, behavioral signals, intent, life stage, or anything personalized AI would actually want to know about you.
Every one of them deserves a relevant AI.
Personalizing for the "average" means under-serving every one of them.
311K cohorts is 311K subsets to score — and the eval cost compounds at every one.
If per-cohort relevance was unaffordable, maybe your tweets, likes, and circles could tell us what you cared about. It worked.
Who you follow, what you tweet — live interest signal.
What you like and share — your taste graph.
Who you circled. (2011–2019, rest in peace.)
relevance lift 🎉
The signals worked — and that's exactly why they got locked up. Borrowed context is a rug that gets pulled.
The next decade of AI competition won't be won by smarter models. It'll be won by systems that know the user — their context, their history, their intent.
Raw model quality is converging. The durable product advantage is the surface around the model — how it adapts to each user, what it remembers, how its output reshapes over time.
Frontier models converge in quality, latency, and price. The layer above the model — user context, memory, behavioral signals, identity — is where the durable advantage will live.
Another major search company was anti-personalization for years. The argument was sharper than the Silicon Valley caricature. It went like this:
Suppose you're a foodie. You told the system you hate McDonald's. So the search engine filters it out.
Then there's an incident at McDonald's — a food-safety scandal, a labor moment, a major news story.
You'd want to know.
You wouldn't.
Stated preferences are not the same as information needs.
Filter what someone said they don't care about, and you eventually filter out the moments they would have absolutely wanted to see.
Personalization without judgment becomes suppression.
Stated preferences need a layer of judgment — what's important sometimes overrides what someone said they want.
Stated preferences alone over-filter. The ranker needs an importance signal that can override negative preferences in rare cases.
You can prove a ranker works at population scale.
You can't prove it works for you.
Even with infinite money and perfect human raters, an individual user will never issue enough queries — in a testing window, or in a lifetime — to give you statistical confidence the personalized ranking is right for them.
Population proxies break down at N = 1.
For fifteen years this was the wall — not the algorithms, the measurement. That's the wall AI is finally taking down.
For fifteen years the wall was measurement, not ranking. AI just made per-user evaluation possible.
Eval at N=1 was the wall, not retrieval or scoring. LLM-judges and synthetic users finally make per-user eval tractable.
Raw intelligence no longer separates them — and the labs know it. So they're racing to bolt on the one thing that does: memory and personalization.
The goal is to raise switching costs and keep you on one platform. Your memory becomes their moat.
Useful to you? Sometimes. Designed around you, owned by you, portable for you? Not even close.
Memory that locks you in is not the same as memory that works for you. That gap is the opportunity.
Memory built for retention isn't the same as memory built for relevance. The product opportunity is the gap between the two.
Vendor-resident memory is a moat for the vendor. User-resident memory is a moat for the product. The architectural fork is the opportunity.
Each era's breakout product won on one thing — showing each person what was relevant to them.
The product that knows you wins. The product that doesn't, leaves.
Every frontier AI platform is racing to ship features that customize the AI for the user. They each invented their own brand for it. They're all the same idea — personalization, in pieces.
Tune frontier models against your own data and context — adapted at the weights layer. Personalization, baked into the model.
Bundles of instructions, tools, and context telling Claude how to behave for a specific job. Personalization for a task.
Pre-loaded prompts, knowledge, and actions wrapped around a use case. Personalization as a product.
The model remembers facts about you across sessions. Literally memory.
Reusable AI personas with custom instructions and context. Personalization as personas.
Plug your data, tools, and services into any LLM. Personalization as context plumbing.
Codebase-specific instructions the AI follows on every change. Personalization for code.
Different names. Same shape. The market is already moving — but every piece is trapped inside one platform.
Skills, GPTs, Memory, Gems, MCP — same shape, different vendors. The market is shipping personalization in pieces, locked to each platform.
Skills, GPTs, Memory, Gems, MCP are all user-context plumbing under different brands. Each is platform-local; none are portable.
This week at WWDC, Apple unveiled Siri AI in iOS 27 — Siri rebuilt around your personal context: photos, calendar, email, messages, contacts, files, and what's on your screen right now.
Siri reads across Mail, Messages, Photos, Calendar, Contacts, and Files — and answers through what it knows about you.
Acts on whatever you're looking at — building multi-step calendar events from on-screen text, surfacing recommendations from past messages. Context without the prompt.
Conversation history that persists and syncs across devices via iCloud. Memory as a default, not a feature.
The deep context is siloed in Apple's native apps — third-party access is an open question. Your context, their walled garden.
The world's biggest company just validated the thesis: the next moat isn't the model — it's knowing the user. And it's platform-locked. The land grab is on.
Apple didn't ship a smarter model — it shipped deeper user context. And that context stays inside the OS. Remember this slide when we get to who owns your AI identity.
Apple's differentiation is the context store, not the model — system-level, on-device-first, and non-portable. The strongest proof yet that the context layer is the product.
The personalization layer may become more valuable than the model itself.
The personalization layer is where durable product advantage lives — above the model.
Treat the user-context layer as its own service with versioned APIs. The model becomes a swappable dependency.
Models are increasingly interchangeable. The stack that surrounds them — memory, signals, identity — is what compounds.
Same query. Same web. A single LLM pass re-orders the results for who you actually are.
One LLM call rewrites the ranking using who the user actually is.
Same query. Same web. The order is now tuned to a senior engineer who already knows what Heroku is.
Cost: fraction of a cent.
Latency: sub-second.
Infra: none — just a prompt.
One click fires a follow-on prompt that re-asks the model with your context — and your guardrails — attached.
The button doesn't change the model. It silently re-asks with context.
A follow-on prompt is auto-generated — pulling who the user is, what they're building, and what they can't share — then re-runs against the same model.
Guardrails travel with the user.
Compliance, safety, and tone all live in the personalization layer — not baked into the foundation model.
Same facts. Same sources. Rewritten in real time to match your politics, your reading level, and your attention.
Better memory
→ better relevance
→ more usage
→ richer context
→ better adaptation
Each loop deepens the moat — and shallows competitors'. Watch the meters climb with every revolution.
Every app silos its own thin version of you. And the companies with the most data have the least incentive to set it free.
Centralized control is exactly the wrong shape for something this personal.
Centralized personalization can't be neutral about portability. The product shape has to change.
The user-context store has to live with the user. Every other architecture leaks data or locks-in by design.
A generic agent can do the task. A personalized agent does it the way it actually needs to be done. And here's the part most people miss — "you" is just as often a company as a person. Adapting an AI to an organization is personalization too.
Firms like a16z must append "this is not investment advice" to every public and social post. A personalized agent carries that compliance profile and enforces it automatically — on every draft, every time. Nobody has to remember.
A healthcare company's AI must never expose — or train on — real patient data. That regulation is part of the organization's profile. The agent adapts to it exactly the way it adapts to a person's preferences.
An agent that ignores your company's rules isn't just impersonal — it's undeployable.
An agent that doesn't respect an organization's policies can't be shipped into a real workflow.
Without an org-policy bundle wired into every call, the agent fails review long before it reaches production.
News, recommendations, what's nearby — gathered from several signals and ranked in real time. The logic behind it is just an instruction:
Four signals in. One ranked feed out.
Four signals — location, interests, nearby, news — merged into one ranked feed.
Four context sources → one LLM rank call → an ordered feed with per-item relevance scores.
Why hand you ten blue links when the AI can spawn the exact tool you need — a planner, a lesson, a workflow — personalized, interactive, and disposable.
The interface itself becomes a personalized, generated artifact — built for one person, one moment, one task.
Today's apps are the lowest common denominator: buttons you'll never tap, and the Starbucks home screen pushing triple-shot promos at someone who has never once ordered caffeine.
No menu. No hunting. The interface knows where you are, what time it is, and what you always order.
I built a Glass app that watched where you walked. Approach a Target, and by the time you reached the door, the HUD was already showing what's on sale that you'd actually want.
2013 hardware, 2026 idea: context + identity + location = relevance before you even ask.
Your AI only shows you the version of the world you already believe.
Knowing the user is the same skill as exploiting the user.
Personal context becomes the asset, not the user.
Models that mirror you can stop challenging you.
Convenience that quietly erodes your ability to choose.
How do we preserve user agency in a world where AI knows us better than we know ourselves?
How do products build agency in — letting users see, edit, and override what the AI knows about them?
Surface the user-context store to the user: inspect, edit, delete. Without those primitives, personalization becomes surveillance.
User-owned personalization isn't just a product choice — it's the structural answer.
Not just content for you — content as seen by someone who isn't you. The same machinery that builds filter bubbles can break them.
Read today's news as someone who disagrees with you. The bubble pops the moment you can borrow another lens.
Personalize the same slides for execs, builders — or an AICon track session. You're watching this deck do it live.
Simulate the room: replay the pitch as each attendee and learn what they heard — not what you said.
Show the same numbers through their eyes — and surface the objections before the meeting, not during it.
Impersonation is empathy at scale — the antidote to the filter bubble, not just its cause.
Memory. Preferences. Behavior. Relationships. Identity.
This will define the next decade of AI competition.
This is the next decade's product question — and it's yours to answer first.
This is the next decade's architectural fork. Pick early, and pick portably.
Three bets that pay off whether you're the CEO, the head of product, or the engineer picking the next tool. None of them require building a frontier model.
Stop scattering preferences across products. Build (or adopt) one user-context layer that every AI surface reads from. The layer compounds. The features don't.
The $12M problem is solvable now. Synthetic users and AI personas let you validate per-cohort and per-user relevance at a fraction of the cost. Build that capability before you scale personalization.
Export, import, and inspect should be first-class. Lock-in feels like a moat today; it'll feel like a liability the moment users (and regulators) catch up. Build the moat from relevance, not friction.
The companies that win won't have the smartest AI. They'll be the ones whose AI knows their customer best.
Three product moves you can ship without convincing the board to fund a research lab. Each is a roadmap item, not a strategy memo.
Pick the user-context object you want every surface to read from. Define it once. Wire search, chat, notifications, recs, settings all through the same object. No more per-feature preference toggles.
Define personas, write LLM-judge prompts, run them in CI. Per-cohort relevance scores become a dashboard your team checks daily. You'll find bugs in production rankers you didn't know you had.
A user-facing page that shows exactly what the AI knows about them. Edit any of it. Export it. Delete it. This is the kind of surface that builds trust — and that regulators are about to require.
The product that knows the user wins. The product that lets the user know what it knows wins twice.
The winning AI systems
Or the biggest. Or the fastest. Or the cheapest.
They will be the ones that are
Personalization is the moat. The race is on for who owns it.
The winning products
Not the most capable. Not the most novel. Not the most differentiated on the model card.
They'll be the products that
The product layer above the model is the durable advantage. Build there.
The winning stacks
Not the most parameters. Not the lowest cost-per-token. Not the proprietary weights.
They'll be the stacks that
The moat is the layer above the model. Build the API. Make it portable. Make it inspectable.
Intelligence determines what AI can do.
Personalization determines what AI should do — for you.
What parts of your digital identity should AI remember — and what should it forget?
What part of your product's personalization should the user own — and what should the platform?
Where does your user-context store live today — and what would it take to make it portable?
TOOND.AI
https://toond.ai/
Interested in playing with full web personalization? Sign up — and I'll share the Chrome extension soon.