docs(website): align mempalaceofficial.com with honest benchmarks
Part of #875. Bring the VitePress site into line with the new README and the reproducibility scorecard: drop category-error comparisons, drop retracted claims, retain only metrics and caveats that survive audit. website/index.md - New tagline matches README (local-first, verbatim, pluggable backend, 96.6% R@5 raw, zero API calls). - Replace the "MemPalace hybrid 100% / Supermemory ~99% / Mastra 94.87% / Mem0 ~85%" comparison table with a single honest table showing MemPalace's own retrieval-recall numbers (raw 96.6%, hybrid v4 held-out 98.4%). Add an explicit sentence explaining why we no longer publish a cross-system table on the landing page (retrieval recall vs QA accuracy are different metrics). - Soften the "ChromaDB-powered vector search" feature blurb to be backend-agnostic, since the retrieval layer is pluggable. website/reference/benchmarks.md - Full rewrite of the retrieval-recall tables. No more "100%" headline; honest held-out 98.4% R@5 replaces it. Added the model-agnostic rerank result (99.2% R@5 / 100% R@10 with minimax-m2.7 via Ollama) to show the pipeline is not Haiku-specific. - Drop the LoCoMo "Hybrid v5 + Sonnet rerank (top-50) 100%" row. With per-conversation session counts of 19-32 and top_k=50, the retrieval stage returns every session by construction — the number measures an LLM's reading comprehension, not retrieval. - Drop the cross-system comparison tables. Link out to each project's own research page (Mastra, Mem0, Supermemory) for their published numbers and metric definitions. - Rewrite reproduction commands to use the correct repository and demonstrate the new --llm-backend ollama flag. website/concepts/the-palace.md - Remove the "+34%" row / paragraph. Wing/room filtering is standard metadata filtering in the vector store, not a novel retrieval mechanism — the April-7 note already retracted that framing; this finishes the retraction on the website where it had remained. website/guide/searching.md - Same treatment for "34% retrieval improvement". Reframe as operational scoping, not a novel boost. website/reference/contributing.md - Update the "palace structure matters" bullet to reflect the same framing: scoping-not-magic. website/concepts/knowledge-graph.md - Replace the MemPalace-vs-Zep feature matrix with a short "related work" note that links to Zep's own documentation for authoritative details on their deployment model. Avoids claims we cannot verify at source.
This commit is contained in:
+14
-13
@@ -4,7 +4,7 @@ layout: home
|
||||
hero:
|
||||
name: MemPalace
|
||||
text: Give your AI a memory.
|
||||
tagline: "96.6% recall on LongMemEval in raw mode. Local-first, open source, and usable without an API key."
|
||||
tagline: "Local-first AI memory. Verbatim storage, pluggable backend, 96.6% R@5 raw on LongMemEval — zero API calls."
|
||||
image:
|
||||
src: /mempalace_logo.png
|
||||
alt: MemPalace
|
||||
@@ -34,7 +34,7 @@ features:
|
||||
src: /icons/search.svg
|
||||
alt: Semantic Search
|
||||
title: Semantic Search
|
||||
details: ChromaDB-powered vector search lets the model retrieve past discussions by topic, project, or room.
|
||||
details: Vector search over verbatim content lets the model retrieve past discussions by topic, project, or room. Backend is pluggable.
|
||||
- icon:
|
||||
src: /icons/git-merge.svg
|
||||
alt: Knowledge Graph
|
||||
@@ -49,7 +49,7 @@ features:
|
||||
src: /icons/shield-check.svg
|
||||
alt: Zero Cloud
|
||||
title: Zero Cloud
|
||||
details: Core storage and retrieval run locally on ChromaDB and SQLite. Optional reranking features can add an API dependency.
|
||||
details: Core storage and retrieval run locally. Optional reranking features can add an API dependency but are not required for the benchmark path.
|
||||
---
|
||||
|
||||
<style>
|
||||
@@ -68,20 +68,21 @@ features:
|
||||
|
||||
## Verbatim Retrieval First
|
||||
|
||||
MemPalace starts from a simple premise: **store the source text and retrieve it well**. The benchmarked raw mode does not require an LLM extraction step.
|
||||
MemPalace stores source text and retrieves it with semantic search. The benchmarked raw mode does not require an LLM at any stage — no extraction, no rerank, no summarisation.
|
||||
|
||||
| System | LongMemEval R@5 | API Required | Cost |
|
||||
|--------|----------------|--------------|------|
|
||||
| **MemPalace (hybrid)** | **100%** | Optional | Free |
|
||||
| Supermemory ASMR | ~99% | Yes | — |
|
||||
| **MemPalace (raw)** | **96.6%** | **None** | **Free** |
|
||||
| Mastra | 94.87% | Yes | API costs |
|
||||
| Mem0 | ~85% | Yes | $19–249/mo |
|
||||
**LongMemEval retrieval recall (500 questions):**
|
||||
|
||||
The raw 96.6% LongMemEval result is the baseline story: strong recall without requiring an API key or an LLM in the retrieval pipeline.
|
||||
| Mode | R@5 | LLM required |
|
||||
|---|---|---|
|
||||
| Raw (semantic search over verbatim text) | **96.6%** | None |
|
||||
| Hybrid v4, held-out 450q | **98.4%** | None |
|
||||
|
||||
The raw 96.6% reproduces on any machine with the committed dataset: result JSONLs, the `seed=42` train/held-out split, and the `--mode raw` / `--held-out` runners are all in the `benchmarks/` directory of the repo.
|
||||
|
||||
We deliberately do not publish a side-by-side comparison against other memory systems on this page. Retrieval recall (R@5) and end-to-end QA accuracy are different metrics and are not comparable; where MemPalace can be fairly compared on the same metric, we link to the other project's published source.
|
||||
|
||||
<div style="text-align: center; padding-top: 16px;">
|
||||
<a href="./reference/benchmarks" style="color: var(--vp-c-brand-1); font-weight: 500;">Full benchmark results →</a>
|
||||
<a href="./reference/benchmarks" style="color: var(--vp-c-brand-1); font-weight: 500;">Full benchmark methodology →</a>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
Reference in New Issue
Block a user