From f20a1a30fe90ca8accef3732fde88e6f34c2ee7e Mon Sep 17 00:00:00 2001 From: Igor Lins e Silva <4753812+igorls@users.noreply.github.com> Date: Tue, 14 Apr 2026 21:37:45 -0300 Subject: [PATCH] docs(website): align mempalaceofficial.com with honest benchmarks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Part of #875. Bring the VitePress site into line with the new README and the reproducibility scorecard: drop category-error comparisons, drop retracted claims, retain only metrics and caveats that survive audit. website/index.md - New tagline matches README (local-first, verbatim, pluggable backend, 96.6% R@5 raw, zero API calls). - Replace the "MemPalace hybrid 100% / Supermemory ~99% / Mastra 94.87% / Mem0 ~85%" comparison table with a single honest table showing MemPalace's own retrieval-recall numbers (raw 96.6%, hybrid v4 held-out 98.4%). Add an explicit sentence explaining why we no longer publish a cross-system table on the landing page (retrieval recall vs QA accuracy are different metrics). - Soften the "ChromaDB-powered vector search" feature blurb to be backend-agnostic, since the retrieval layer is pluggable. website/reference/benchmarks.md - Full rewrite of the retrieval-recall tables. No more "100%" headline; honest held-out 98.4% R@5 replaces it. Added the model-agnostic rerank result (99.2% R@5 / 100% R@10 with minimax-m2.7 via Ollama) to show the pipeline is not Haiku-specific. - Drop the LoCoMo "Hybrid v5 + Sonnet rerank (top-50) 100%" row. With per-conversation session counts of 19-32 and top_k=50, the retrieval stage returns every session by construction — the number measures an LLM's reading comprehension, not retrieval. - Drop the cross-system comparison tables. Link out to each project's own research page (Mastra, Mem0, Supermemory) for their published numbers and metric definitions. - Rewrite reproduction commands to use the correct repository and demonstrate the new --llm-backend ollama flag. website/concepts/the-palace.md - Remove the "+34%" row / paragraph. Wing/room filtering is standard metadata filtering in the vector store, not a novel retrieval mechanism — the April-7 note already retracted that framing; this finishes the retraction on the website where it had remained. website/guide/searching.md - Same treatment for "34% retrieval improvement". Reframe as operational scoping, not a novel boost. website/reference/contributing.md - Update the "palace structure matters" bullet to reflect the same framing: scoping-not-magic. website/concepts/knowledge-graph.md - Replace the MemPalace-vs-Zep feature matrix with a short "related work" note that links to Zep's own documentation for authoritative details on their deployment model. Avoids claims we cannot verify at source. --- website/concepts/knowledge-graph.md | 15 ++- website/concepts/the-palace.md | 11 +- website/guide/searching.md | 21 ++-- website/index.md | 27 ++--- website/reference/benchmarks.md | 152 +++++++++++++++++++--------- website/reference/contributing.md | 2 +- 6 files changed, 133 insertions(+), 95 deletions(-) diff --git a/website/concepts/knowledge-graph.md b/website/concepts/knowledge-graph.md index d7969b3..97a318f 100644 --- a/website/concepts/knowledge-graph.md +++ b/website/concepts/knowledge-graph.md @@ -80,12 +80,11 @@ The knowledge graph uses SQLite with two tables: Database location: `~/.mempalace/knowledge_graph.sqlite3` -## Comparison +## Related Work -| Feature | MemPalace | Zep (Graphiti) | -|---------|-----------|----------------| -| Storage | SQLite (local) | Neo4j (cloud) | -| Cost | Free | $25/mo+ | -| Temporal validity | Yes | Yes | -| Self-hosted | Always | Enterprise only | -| Privacy | Everything local | SOC 2, HIPAA | +Temporal entity-relationship graphs are a familiar pattern — Zep's +Graphiti, for example, also exposes a bi-temporal model. MemPalace's +knowledge graph is local-first (SQLite, everything on disk) and free; +Zep is a managed service backed by Neo4j with its own pricing, SLAs, +and compliance surface. See Zep's own [documentation](https://www.getzep.com/) +for authoritative details on their deployment model. diff --git a/website/concepts/the-palace.md b/website/concepts/the-palace.md index 2ef01a4..c2fd114 100644 --- a/website/concepts/the-palace.md +++ b/website/concepts/the-palace.md @@ -92,16 +92,9 @@ The original stored text chunks. This is the primary retrieval layer used by the ## Why Structure Matters -Tested on 22,000+ real conversation memories: +Wing and room identifiers become metadata filters at query time. Narrowing a search to a specific wing (or wing + room) means the vector store only scores candidates inside that scope, which is useful when you have many unrelated projects or people filed in the same palace. -| Search scope | R@10 | Improvement | -|-------------|------|-------------| -| All closets | 60.9% | baseline | -| Within wing | 73.1% | +12% | -| Wing + hall | 84.8% | +24% | -| Wing + room | 94.8% | +34% | - -The practical point is that structure improves retrieval. In the project benchmarks, narrowing the search scope by wing and room outperformed searching the entire corpus at once. +This is standard metadata filtering in the underlying vector store, not a novel retrieval mechanism. The useful property here is operational — clear scoping rules that a human or an agent can apply predictably — not a magic retrieval boost. ## Navigation diff --git a/website/guide/searching.md b/website/guide/searching.md index e9c0bbb..ce2ce3b 100644 --- a/website/guide/searching.md +++ b/website/guide/searching.md @@ -23,23 +23,16 @@ mempalace search "deploy process" --results 10 ## How Search Works -1. Your query is embedded using ChromaDB's default model (`all-MiniLM-L6-v2`) -2. The embedding is compared against all drawers using cosine similarity -3. Optional wing/room filters narrow the search scope -4. Results are returned with similarity scores and source metadata +1. Your query is embedded using the vector store's default model (`all-MiniLM-L6-v2` with the default ChromaDB backend). +2. The embedding is compared against all drawers using cosine similarity. +3. Optional wing/room filters narrow the search scope — standard metadata filtering in the underlying vector store. +4. Results are returned with similarity scores and source metadata. -### Why Structure Matters +### Why Scoping Matters -Tested on 22,000+ real conversation memories: +Wing/room filtering is useful when a single palace contains many unrelated projects or people. Narrowing the search to a specific wing (or wing + room) means the vector store only scores candidates inside that scope, which keeps retrieval predictable as the palace grows. -``` -Search all closets: 60.9% R@10 -Search within wing: 73.1% (+12%) -Search wing + hall: 84.8% (+24%) -Search wing + room: 94.8% (+34%) -``` - -Wings and rooms aren't cosmetic — they're a **34% retrieval improvement**. +This is a metadata-filter feature of the vector store, not a novel retrieval mechanism. Treat it as an operational convenience: clear scoping rules that a human or an agent can apply predictably. ## Programmatic Search diff --git a/website/index.md b/website/index.md index a8487fb..b32cc35 100644 --- a/website/index.md +++ b/website/index.md @@ -4,7 +4,7 @@ layout: home hero: name: MemPalace text: Give your AI a memory. - tagline: "96.6% recall on LongMemEval in raw mode. Local-first, open source, and usable without an API key." + tagline: "Local-first AI memory. Verbatim storage, pluggable backend, 96.6% R@5 raw on LongMemEval — zero API calls." image: src: /mempalace_logo.png alt: MemPalace @@ -34,7 +34,7 @@ features: src: /icons/search.svg alt: Semantic Search title: Semantic Search - details: ChromaDB-powered vector search lets the model retrieve past discussions by topic, project, or room. + details: Vector search over verbatim content lets the model retrieve past discussions by topic, project, or room. Backend is pluggable. - icon: src: /icons/git-merge.svg alt: Knowledge Graph @@ -49,7 +49,7 @@ features: src: /icons/shield-check.svg alt: Zero Cloud title: Zero Cloud - details: Core storage and retrieval run locally on ChromaDB and SQLite. Optional reranking features can add an API dependency. + details: Core storage and retrieval run locally. Optional reranking features can add an API dependency but are not required for the benchmark path. ---