Files
mempalace/docs/HISTORY.md
T
Igor Lins e Silva 65bf1ebda3 docs: slim README and move corrections/notices to docs/HISTORY.md
Addresses #875. The previous README was 755 lines mixing six purposes
(scam alert, hero, two mea-culpa notes, install guide, architecture
explainer, API reference, file map). Rework it as a pure entry point:
what MemPalace is, how to install, honest benchmark numbers, links to
the website for concept/architecture documentation.

Key content changes:
 - Drop the "highest-scoring AI memory system ever benchmarked" framing.
 - New tagline: "Local-first AI memory. Verbatim storage, pluggable
   backend, 96.6% R@5 raw on LongMemEval — zero API calls." Avoids
   naming a specific vector-store implementation since the backend is
   pluggable (see mempalace/backends/base.py).
 - Remove the cross-system comparison table. Retrieval recall (R@5)
   and end-to-end QA accuracy are different metrics and are not
   comparable; placing MemPalace's R@5 next to competitor QA accuracy
   under a single column header was a category error.
 - The "100%" LongMemEval headline is no longer the lead. The honest
   held-out figure is 98.4% R@5 on 450 unseen questions. The rerank
   pipeline reaches >=99% with any capable LLM (reproduced with
   Claude Haiku, Sonnet, and minimax-m2.7 via Ollama) — pipeline-level,
   not model-specific.
 - Benchmark reproduction commands now reference the correct repo
   (MemPalace/mempalace, not the defunct aya-thekeeper/mempal branch).

New file: docs/HISTORY.md as the canonical home for post-launch
corrections, public notices, and retractions. Contains verbatim:
 - 2026-04-14 note on this rewrite (links to #875)
 - 2026-04-11 impostor-domain notice (moved from README header)
 - 2026-04-07 "A Note from Milla & Ben" (moved from README body)

README keeps a one-line scam-alert callout that links to
docs/HISTORY.md for the full timeline.
2026-04-14 21:37:20 -03:00

7.1 KiB
Raw Blame History

MemPalace — History, Corrections, and Public Notices

This file is the canonical record of post-launch corrections, public notices, and retractions that affect MemPalace's public claims. Newest first.


2026-04-14 — Benchmark table rewrite (issue #875)

A community audit identified a category error in the public benchmark tables on README.md and mempalaceofficial.com: MemPalace's retrieval recall numbers (R@5, R@10) were listed in the same columns as competitors' end-to-end QA accuracy numbers. They are different metrics and are not comparable — a system can have 100% retrieval recall and 40% QA accuracy.

The audit also found that the retracted "+34% palace boost" claim (see the April 7 note below) was still present in multiple surfaces despite that retraction, and that two competitor numbers (Mem0 ~85%, Zep ~85%) had no published source and did not match the metrics those projects actually publish.

What changed in this PR:

  • The headline number on all surfaces is now 96.6% R@5 on LongMemEval in raw mode, independently reproduced on Linux x86_64 against the tagged v3.3.0 release on 2026-04-14. Result JSONLs are committed under benchmarks/results_*.jsonl (see PR description for the scorecard).
  • The "100% with Haiku rerank" claim has been removed from all public comparison tables. It reproduces on our machines and with a different LLM family (minimax-m2.7 via Ollama Cloud: 99.2% R@5 / 100.0% R@10 on the full 500-question LongMemEval set) — but the 99.4% → 100% step was developed by inspecting three specific wrong answers (benchmarks/BENCHMARKS.md has called this "teaching to the test" since February). It belongs in the methodology document, not in a headline.
  • The honest held-out number for the hybrid pipeline — 98.4% R@5 on 450 questions that hybrid_v4 was never tuned on, deterministic seed — is now the comparable figure when an LLM rerank is involved.
  • The retracted "+34% palace boost" has been removed from README.md, website/concepts/the-palace.md, website/guide/searching.md, and website/reference/contributing.md. Wing and room filters remain useful — they're standard metadata filters — but they are not presented as a novel retrieval improvement.
  • Competitor comparison tables mixing retrieval recall with QA accuracy have been removed from README.md and website/reference/benchmarks.md. Where MemPalace can be fairly compared on the same metric, we link to the cited source. Otherwise we report our own numbers and let readers draw their own conclusions.
  • Reproduction instructions in benchmarks/BENCHMARKS.md and benchmarks/README.md were pointing at a defunct branch (aya-thekeeper/mempal); they now point at MemPalace/mempalace.
  • The LoCoMo 100% R@10 with top-50 rerank row has been removed from public comparison surfaces. With per-conversation session counts of 1932 and top_k=50, the retrieval stage returns every session in the conversation by construction, so the number measures an LLM's reading comprehension over the whole conversation, not retrieval.

Thanks to @dial481 for the detailed audit and to @rohitg00 for the parallel write-up in Discussion #747.


2026-04-11 — Impostor domains and malware

Several community members (issues #267, #326, #506) reported fake MemPalace websites distributing malware. The only official surfaces for this project are:

Any other domain — mempalace.tech being the one most commonly reported — is not ours. Never run install scripts from unofficial sites.

Thanks to our community members for flagging the problem.


2026-04-07 — A Note from Milla & Ben

The community caught real problems in this README within hours of launch and we want to address them directly.

What we got wrong:

  • The AAAK token example was incorrect. We used a rough heuristic (len(text)//3) for token counts instead of an actual tokenizer. Real counts via OpenAI's tokenizer: the English example is 66 tokens, the AAAK example is 73. AAAK does not save tokens at small scales — it's designed for repeated entities at scale, and the README example was a bad demonstration of that. We're rewriting it.

  • "30x lossless compression" was overstated. AAAK is a lossy abbreviation system (entity codes, sentence truncation). Independent benchmarks show AAAK mode scores 84.2% R@5 vs raw mode's 96.6% on LongMemEval — a 12.4 point regression. The honest framing is: AAAK is an experimental compression layer that trades fidelity for token density, and the 96.6% headline number is from RAW mode, not AAAK.

  • "+34% palace boost" was misleading. That number compares unfiltered search to wing+room metadata filtering. Metadata filtering is a standard feature of the underlying vector store, not a novel retrieval mechanism. Real and useful, but not a moat.

  • "Contradiction detection" exists as a separate utility (fact_checker.py) but is not currently wired into the knowledge graph operations as the README implied.

  • "100% with Haiku rerank" is real (we have the result files) but the rerank pipeline is not in the public benchmark scripts. We're adding it.

What's still true and reproducible:

  • 96.6% R@5 on LongMemEval in raw mode, on 500 questions, zero API calls — independently reproduced on M2 Ultra in under 5 minutes by @gizmax.
  • Local, free, no subscription, no cloud, no data leaving your machine.
  • The architecture (wings, rooms, closets, drawers) is real and useful, even if it's not a magical retrieval boost.

What we're doing:

  1. Rewriting the AAAK example with real tokenizer counts and a scenario where AAAK actually demonstrates compression
  2. Adding mode raw / aaak / rooms clearly to the benchmark documentation so the trade-offs are visible
  3. Wiring fact_checker.py into the KG ops so the contradiction detection claim becomes true
  4. Pinning the vector store dependency to a tested range (issue #100), fixing the shell injection in hooks (#110), and addressing the macOS ARM64 segfault (#74)

Thank you to everyone who poked holes in this. Brutal honest criticism is exactly what makes open source work, and it's what we asked for. Special thanks to @panuhorsmalahti, @lhl, @gizmax, and everyone who filed an issue or a PR in the first 48 hours. We're listening, we're fixing, and we'd rather be right than impressive.

Milla Jovovich & Ben Sigman