docs: add VitePress documentation site

- 22 content pages across Guide, Concepts, and Reference sections - Custom indigo/cyan theme with Lucide icons and Mermaid diagrams - GitHub Actions workflow for GitHub Pages deployment - Live preview: https://mempalace-docs.netlify.app/
2026-04-09 19:11:23 -03:00
parent 2981433535
commit dfb22f5345
37 changed files with 3673 additions and 0 deletions
@@ -0,0 +1,95 @@
+# Benchmarks
+
+Curated summary of MemPalace benchmark results. For the full 725-line progression with every experiment, see [`benchmarks/BENCHMARKS.md`](https://github.com/milla-jovovich/mempalace/blob/main/benchmarks/BENCHMARKS.md) in the repository.
+
+## The Core Finding
+
+MemPalace's benchmarked raw baseline stores the source text and searches it with ChromaDB's default embeddings. No extraction layer or summarization step is required for that baseline.
+
+**And it scores 96.6% on LongMemEval.**
+
+## LongMemEval Results
+
+| Mode | R@5 | LLM Required | Cost/query |
+|------|-----|-------------|------------|
+| Raw ChromaDB | **96.6%** | None | $0 |
+| Hybrid v3 + rerank | 99.4% | Haiku | ~$0.001 |
+| Palace + rerank | 99.4% | Haiku | ~$0.001 |
+| **Hybrid v4 + rerank** | **100%** | Haiku | ~$0.001 |
+
+The 96.6% raw score requires no API key, no cloud, and no LLM at any stage. The 100% result uses optional Haiku reranking.
+
+### Per-Category Breakdown (Raw, 96.6%)
+
+| Question Type | R@5 | Count |
+|---------------|-----|-------|
+| Knowledge update | 99.0% | 78 |
+| Multi-session | 98.5% | 133 |
+| Temporal reasoning | 96.2% | 133 |
+| Single-session user | 95.7% | 70 |
+| Single-session preference | 93.3% | 30 |
+| Single-session assistant | 92.9% | 56 |
+
+### Held-Out Validation
+
+**98.4% R@5** on 450 questions that hybrid_v4 was never tuned on — confirming the improvements generalize.
+
+## Comparison vs Published Systems
+
+| System | LongMemEval R@5 | API Required | Cost |
+|--------|----------------|--------------|------|
+| **MemPalace (hybrid)** | **100%** | Optional | Free |
+| Supermemory ASMR | ~99% | Yes | — |
+| **MemPalace (raw)** | **96.6%** | **None** | **Free** |
+| Mastra | 94.87% | Yes | API costs |
+| Hindsight | 91.4% | Yes | API costs |
+| Mem0 | ~85% | Yes | $19–249/mo |
+
+## Other Benchmarks
+
+### ConvoMem (Salesforce, 75K+ QA pairs)
+
+| System | Score |
+|--------|-------|
+| **MemPalace** | **92.9%** |
+| Gemini (long context) | 70–82% |
+| Block extraction | 57–71% |
+| Mem0 (RAG) | 30–45% |
+
+On this benchmark, MemPalace materially outperforms the Mem0 result cited in the comparison table.
+
+### LoCoMo (1,986 multi-hop QA pairs)
+
+| Mode | R@10 | LLM |
+|------|------|-----|
+| Hybrid v5 + Sonnet rerank (top-50) | **100%** | Sonnet |
+| bge-large + Haiku rerank (top-15) | 96.3% | Haiku |
+| Hybrid v5 (top-10, no rerank) | **88.9%** | None |
+| Session, no rerank (top-10) | 60.3% | None |
+
+### MemBench (ACL 2025, 8,500 items)
+
+**80.3% R@5** overall. Strongest categories: aggregative (99.3%), comparative (98.4%), lowlevel_rec (99.8%).
+
+## Reproducing Results
+
+All benchmarks are reproducible with public datasets:
+
+```bash
+git clone https://github.com/milla-jovovich/mempalace.git
+cd mempalace
+pip install chromadb pyyaml
+
+# Download LongMemEval data
+curl -fsSL -o /tmp/longmemeval_s_cleaned.json \
+  https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
+
+# Run raw baseline (96.6%, no API key needed)
+python benchmarks/longmemeval_bench.py /tmp/longmemeval_s_cleaned.json
+```
+
+::: tip
+Results are deterministic. Same data + same script = same result every time. Every result JSONL file contains every question, every retrieved document, every score.
+:::
+
+For complete reproduction instructions, benchmark integrity notes, and the full score progression, see the [full benchmark documentation](https://github.com/milla-jovovich/mempalace/blob/main/benchmarks/BENCHMARKS.md).