Ask a RAG system "how do I price my coaching program?" and it searches everything. Works great.
Now ask "what was discussed last week about pricing?" and it does the exact same thing. Searches everything. Ignores the time window entirely.
We just fixed this in Bold. I've been unreasonably excited about it, partly because my first instinct was completely wrong.
The trap I walked into
Detect "last week" in the query, compute the dates, filter results. Done.
Except: our cache breaks immediately.
Bold caches query expansions so we don't burn money on duplicate LLM calls. But "last week" on a Friday points to different dates than "last week" on a Monday. A cached answer from Friday is stale before anyone's had their Monday coffee.
Ok, so add today's date to the cache key? Sure. But now you're blowing up the cache for every question every day. Even stuff like "how do I find clients?" where the date couldn't matter less.
The thing that clicked
I kept staring at this for a while. Then it hit me.
What the user means and what calendar dates that maps to are two completely different problems. "Last week" always means "last week." That understanding doesn't expire. You can cache it forever. What specific dates "last week" points to on any given Tuesday? That's just math. Fast math.
So we split them. The LLM returns something like {kind: "relative", label: "last_week"}. We cache that. At search time, a tiny function turns it into real timestamps. No LLM call. No cache invalidation. Sub-millisecond.
Rich Hickey calls this decomplection. Pulling apart two things that got tangled together. The cache stores meaning. The runtime computes dates. They never touch.
It's one of those solutions that feels embarrassingly obvious once you see it.
"Recently" is a different animal
Someone asks "what's been discussed recently?" and the urge is to help. Pick a window. Thirty days? Ninety?
Don't.
"Recently" means something wildly different for a daily user versus someone who vanished for six months. There's no correct number. And if you guess wrong, you silently hide the exact content they were looking for. They get thin results, blame your product, and you never even know the filter existed.
So we don't filter. Vague input gets vague search. Let relevance do its job.
Empty cupboard
User asks about last week's pricing discussion. Nothing was published last week about pricing.
We drop the time filter. Search everything. One shot. The response says so: "Nothing from last week specifically, but here's what I found from the full library."
We almost built graduated widening. Try 14 days, then 30, then everything. It looked smart on the whiteboard. In practice? Multiple search round-trips for a distinction nobody actually makes. People don't care whether the answer came from two weeks ago or two months ago. They just want the answer.
Recaps want a story
"What happened this month?" is not a search query. It's a request for narrative.
First this happened, then that happened, and it built on this other conversation from the week before. The connective tissue between sessions is the whole point of a recap.
So we detect recap intent and change the shape of the response. Order by date, not relevance. Frame it as a progression through time, not a ranked list of matches.
Bullets would kill it.
One rule under everything
Every decision here came from the same place: don't tangle things that are separate.
Meaning and dates. Filtering and ranking. Precise queries and vague ones. Stories and search results. Keep them apart and each piece stays simple, testable, easy to reason about. Mix them and you build something that looks clever on a whiteboard and falls apart the second real users touch it.
I'm shipping this behind a flag. Watching the data. But the architecture feels right. And honestly, this is the most fun I've had building a feature in months.