Fixes#195.
When ChromaDB returns no documents (empty palace, or wing/room filter
that excludes everything), it returns the shape:
{"documents": [], "metadatas": [], "distances": []}
Indexing `results["documents"][0]` blindly raises IndexError instead of
the expected 'no results' response. Affected: searcher.search(),
searcher.search_memories() (drawer + closet branches plus the
total_before_filter aggregate), and Layer3.search() / Layer3.search_raw().
Adds a tiny private helper `searcher._first_or_empty(results, key)` that
safely extracts the inner list, returning [] for any of: missing key,
empty outer list, [None], or [[]]. layers.py imports the same helper to
avoid duplicating the guard.
Tests: tests/test_empty_chromadb_results.py covers all observed shapes
plus a documentation-style test that pins the original IndexError so
future readers understand why the helper exists.
tool_status() called _get_collection() with the default create=False,
which throws when the ChromaDB collection does not exist yet (valid
palace, zero drawers). The exception was swallowed and status returned
"No palace found" even though init had completed successfully.
Switching to create=True bootstraps an empty collection on first
status call, matching what the write path already does.
Fix suggested by @hkevinchu in the issue.
* fix: make entity_registry.research() local-only by default
research() previously called _wikipedia_lookup() unconditionally,
sending entity names to en.wikipedia.org on every uncached lookup.
This violates the project's local-first and privacy-by-architecture
principles documented in CLAUDE.md.
Changes:
- research() now returns "unknown" for uncached words by default
- New allow_network=True parameter required for Wikipedia lookups
- Wikipedia 404 now returns "unknown" instead of asserting "person"
with 0.70 confidence, preventing entity registry poisoning
- Added privacy warning docstring to _wikipedia_lookup()
- Added tests for local-only default, opt-in network, 404 handling,
and cache-not-persisted-on-local-only behaviour
Refs: MemPalace/mempalace#809
* fix: improve research() cache read path and deduplicate test mocks
- Use .get() instead of .setdefault() for cache reads in research()
so the local-only path never mutates _data unnecessarily
- Move .setdefault() to the network-write path only
- Use result.setdefault() for word/confirmed keys to ensure
consistent return shape across all _wikipedia_lookup error paths
- Extract duplicated mock_result dict into _MOCK_SAOIRSE_PERSON
constant shared by 3 test functions
Fixes#210.
The CLI requires a positional <dir> argument. Previous docs emphasized
that init 'sets up ~/.mempalace/' which misled users into expecting
no arguments. Now the docs show <dir> is required, offer '.' as the
usage for the current directory, and reword the description so the
project-directory scan is listed first.
The regression-guard tests added in #835 were pinned to the old
README shape (tool table + file-reference table). When #897 slimmed
the README and moved that content to the website, three tests
started failing:
TestReadmeToolsExistInCode.test_every_readme_tool_exists_in_tools_dict
TestNoUnlistedTools.test_no_undocumented_tools
TestReadmeDialectNotLossless.test_readme_dialect_line_not_lossless
Changes in this commit:
1. Update the 3 tests to track the new canonical docs surfaces
- Tool list -> website/reference/mcp-tools.md
(tests parse `### \`mempalace_xxx\`` headings instead of
markdown table rows).
- dialect.py lossless disclaimer -> website/reference/modules.md
(any line mentioning dialect.py must not also say "lossless").
2. Fix the website to make "no undocumented tools" true
Add the 10 tools that existed in TOOLS but were missing from
website/reference/mcp-tools.md (create_tunnel, delete_tunnel,
follow_tunnels, list_tunnels, get_drawer, list_drawers,
update_drawer, hook_settings, memories_filed_away, reconnect).
Page header now correctly says "all 29 MCP tools".
3. Align pre-commit ruff pin to match CI (0.4.x)
.pre-commit-config.yaml was pinning ruff v0.9.0, while
.github/workflows/ci.yml installs ruff>=0.4.0,<0.5. The two
formatters produce incompatible output (e.g. v0.9.0 reformats
`assert (x), msg` -> `assert x, (msg)` in a way v0.4.x rejects),
which would cause the pre-commit hook to modify files that CI
then flags as unformatted. Pinning the hook to v0.4.10 keeps
the dev loop and CI in lock-step.
Full suite: 887 passed, 0 failed.
Remaining in-repo surfaces carrying the same retracted or broken
claims as the public pages fixed in the previous two commits.
CONTRIBUTING.md
- "Palace structure matters ... 34% retrieval improvement" → reframed
as scoping (same rewording applied to the website equivalents).
benchmarks/BENCHMARKS.md
- Add a prominent "Important caveat" block at the top of the
"Comparison vs Published Systems" table explaining that R@5
(retrieval recall) and QA accuracy are different metrics, with
citations to Mastra, Mem0, and Supermemory's own published
methodology pages. Annotate the specific competitor rows whose
numbers are QA accuracy, not retrieval recall.
- Annotate the `hybrid v4 + rerank 100%` row to note that the 99.4
→ 100 step was tuned on 3 specific wrong answers (already disclosed
further down in the doc under "Benchmark Integrity"); the honest
hybrid figure is held-out 98.4%.
- Fix the broken clone URL — `aya-thekeeper/mempal` no longer points
at anything; now `MemPalace/mempalace`.
benchmarks/README.md + benchmarks/HYBRID_MODE.md
- Same clone-URL fix applied.
CHANGELOG.md
- Add a ### Documentation entry under [Unreleased] v3.3.0 that names
#875 and summarises the scope of the rewrite.
Part of #875. Bring the VitePress site into line with the new README
and the reproducibility scorecard: drop category-error comparisons,
drop retracted claims, retain only metrics and caveats that survive
audit.
website/index.md
- New tagline matches README (local-first, verbatim, pluggable backend,
96.6% R@5 raw, zero API calls).
- Replace the "MemPalace hybrid 100% / Supermemory ~99% / Mastra
94.87% / Mem0 ~85%" comparison table with a single honest table
showing MemPalace's own retrieval-recall numbers (raw 96.6%,
hybrid v4 held-out 98.4%). Add an explicit sentence explaining why
we no longer publish a cross-system table on the landing page
(retrieval recall vs QA accuracy are different metrics).
- Soften the "ChromaDB-powered vector search" feature blurb to be
backend-agnostic, since the retrieval layer is pluggable.
website/reference/benchmarks.md
- Full rewrite of the retrieval-recall tables. No more "100%"
headline; honest held-out 98.4% R@5 replaces it. Added the
model-agnostic rerank result (99.2% R@5 / 100% R@10 with
minimax-m2.7 via Ollama) to show the pipeline is not Haiku-specific.
- Drop the LoCoMo "Hybrid v5 + Sonnet rerank (top-50) 100%" row.
With per-conversation session counts of 19-32 and top_k=50, the
retrieval stage returns every session by construction — the number
measures an LLM's reading comprehension, not retrieval.
- Drop the cross-system comparison tables. Link out to each project's
own research page (Mastra, Mem0, Supermemory) for their published
numbers and metric definitions.
- Rewrite reproduction commands to use the correct repository and
demonstrate the new --llm-backend ollama flag.
website/concepts/the-palace.md
- Remove the "+34%" row / paragraph. Wing/room filtering is standard
metadata filtering in the vector store, not a novel retrieval
mechanism — the April-7 note already retracted that framing; this
finishes the retraction on the website where it had remained.
website/guide/searching.md
- Same treatment for "34% retrieval improvement". Reframe as
operational scoping, not a novel boost.
website/reference/contributing.md
- Update the "palace structure matters" bullet to reflect the same
framing: scoping-not-magic.
website/concepts/knowledge-graph.md
- Replace the MemPalace-vs-Zep feature matrix with a short "related
work" note that links to Zep's own documentation for authoritative
details on their deployment model. Avoids claims we cannot verify
at source.
Addresses #875. The previous README was 755 lines mixing six purposes
(scam alert, hero, two mea-culpa notes, install guide, architecture
explainer, API reference, file map). Rework it as a pure entry point:
what MemPalace is, how to install, honest benchmark numbers, links to
the website for concept/architecture documentation.
Key content changes:
- Drop the "highest-scoring AI memory system ever benchmarked" framing.
- New tagline: "Local-first AI memory. Verbatim storage, pluggable
backend, 96.6% R@5 raw on LongMemEval — zero API calls." Avoids
naming a specific vector-store implementation since the backend is
pluggable (see mempalace/backends/base.py).
- Remove the cross-system comparison table. Retrieval recall (R@5)
and end-to-end QA accuracy are different metrics and are not
comparable; placing MemPalace's R@5 next to competitor QA accuracy
under a single column header was a category error.
- The "100%" LongMemEval headline is no longer the lead. The honest
held-out figure is 98.4% R@5 on 450 unseen questions. The rerank
pipeline reaches >=99% with any capable LLM (reproduced with
Claude Haiku, Sonnet, and minimax-m2.7 via Ollama) — pipeline-level,
not model-specific.
- Benchmark reproduction commands now reference the correct repo
(MemPalace/mempalace, not the defunct aya-thekeeper/mempal branch).
New file: docs/HISTORY.md as the canonical home for post-launch
corrections, public notices, and retractions. Contains verbatim:
- 2026-04-14 note on this rewrite (links to #875)
- 2026-04-11 impostor-domain notice (moved from README header)
- 2026-04-07 "A Note from Milla & Ben" (moved from README body)
README keeps a one-line scam-alert callout that links to
docs/HISTORY.md for the full timeline.
Addresses #875: every internal BENCHMARKS.md claim reproduced
on Linux x86_64 (v3.3.0 tag, deterministic ChromaDB embeddings,
seed=42 for the LongMemEval dev/held-out split).
Scorecard — all reproduce exactly:
LongMemEval
raw R@5 96.6% (500/500) ✅
hybrid_v4 held-out 450 R@5 98.4% (442/450) ✅
hybrid_v4 + minimax rerank R@5 99.2% (496/500) *
hybrid_v4 + minimax rerank R@10 100.0% (500/500) *
LoCoMo (session, top-10)
raw 60.3% (1986q) ✅
hybrid v5 88.9% (1986q) ✅
ConvoMem all-categories (250 items) 92.9% ✅
MemBench all-categories (8500) 80.3% ✅
* The minimax-m2.7:cloud rerank run replicates the "100%" claim
with a different LLM family (no Anthropic dependency). R@10 is
a perfect reproduction; R@5 misses 4 questions that the
published Haiku run caught — consistent with BENCHMARKS.md's own
disclosure that hybrid_v4 includes three question-specific fixes
developed by inspecting misses, i.e. teaching to the test.
The committed 50/450 split is the deterministic (seed=42) split
BENCHMARKS.md references but wasn't previously in the repo.
Full result JSONLs include every question, every retrieved id,
and every score — auditable end-to-end.
The rerank pipeline was hardcoded to Anthropic's /v1/messages.
Add a backend flag so the same code path can be exercised with
any OpenAI-compatible endpoint — local Ollama, Ollama Cloud,
or any gateway that speaks /v1/chat/completions.
Enables independent verification of the "100% with Haiku rerank"
claim by running the full benchmark with a different LLM family
(e.g. minimax-m2.7:cloud) and zero Anthropic dependency.
Both longmemeval_bench.py and locomo_bench.py:
- llm_rerank*() gain backend= / base_url= kwargs
- CLI: --llm-backend {anthropic,ollama}, --llm-base-url
- API key required only when backend=anthropic (diary/palace modes still require it)
- Parse last integer in response (reasoning models emit multi-int output)
- Fallback to message.reasoning when content is empty
- Raise max_tokens to 1024 for reasoning models
TDD: test first, failed, fixed, passed.
Igor fixed query_relationship/timeline/stats in an earlier commit.
close() was the last method touching self._connection without
holding the lock.
Closes#883.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#872. The top-level decision field only recognizes "block".
To not block, return empty JSON {}. "allow" was silently ignored
by Claude Code, causing unpredictable behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributors now get a one-click dev environment that mirrors CI exactly:
Python 3.11 (middle of the 3.9/3.11/3.13 matrix), ruff pinned to the same
>=0.4.0,<0.5 range CI enforces, and pre-commit hooks auto-installed from
the existing .pre-commit-config.yaml.
Pinning ruff in post-create.sh is the load-bearing piece: pyproject only
sets a floor, so without the pin the ruff extension would install 0.15.x
and phantom-fail lint against CI's 0.4.x.
export MEMPAL_VERBOSE=true → hook blocks, agent writes diary in chat
export MEMPAL_VERBOSE=false → silent background save (default)
Developers need to see code and diaries being written.
Regular users want zero chat clutter. Now both work.
TDD: tests written first, failed, code fixed, tests pass.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move regular expression compilation to the module level in `dialect.py` to prevent repeated parsing during loop execution.
Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>
Replace the blanket ban on .tech/.io/.com domains with an allowlist
of real MemPalace surfaces (GitHub repo, PyPI, mempalaceofficial.com)
and call out mempalace.tech as the reported impostor. The blanket
.com ban would have flagged mempalaceofficial.com as fake once DNS
resolves (CNAME shipped in #877).
Also update the April 11 follow-up section to match so the two
notices no longer contradict each other.
Addresses review feedback on #604:
- Warning now goes to stderr instead of stdout so it doesn't mix with
mine progress output when users pipe stdout elsewhere.
- Warning explicitly calls out that directories with the same basename
will share a wing name, and suggests adding mempalace.yaml to
disambiguate. Prevents silent content mixing across projects mined
without yaml.
When no mempalace.yaml or mempal.yaml exists in the source directory,
return a default config (wing = directory name, room = general) instead
of calling sys.exit(1). This lets users mine any directory into their
palace without requiring init first.
Closes#14.
Builds on @Yorji-Porji's draft by fixing three issues before it lands:
- Replace the `< 1.0.0` placeholder table with MemPalace's actual
support policy: current major (3.x) receives fixes, 2.x and earlier
do not.
- Remove the `[Insert Maintainer Email Here]` placeholder and the
email fallback. GitHub Private Vulnerability Reporting is enabled
on this repo; the policy points there exclusively so there is no
risk of a researcher emailing a dead address.
- Drop the meta-note ("Adjust the table above…") that was an
instruction to the maintainer, not policy text.
Structure, triage timelines, and credit language are kept as drafted.
Adds website/public/CNAME containing `mempalaceofficial.com` so the
VitePress build output always includes /CNAME in the Pages artifact.
Without this, the custom-domain setting is only held in the repo's
Pages API config — if it ever drifts (manual edit, org move, workflow
change), the site reverts to <org>.github.io with no record in source.
Note: this does not fix the current site outage. The root cause is DNS
— mempalaceofficial.com has no A/AAAA/CNAME records pointing at GitHub
Pages IPs. That has to be fixed at the registrar. This commit is the
belt-and-suspenders so that once DNS is back, the domain is pinned in
source and the next workflow refactor can't accidentally drop it.
Tags matching `vX.Y.Z-*` (e.g. v3.4.0-rc1, v1.0.0-beta.2) are treated as
internal/staging builds. They skip the tag-vs-manifest check because
pre-releases do not flow to end users via `/plugin update`, which reads
the manifest on the default branch.
Stable tags `vX.Y.Z` still require all five version sources to match
exactly, so the protection against the #874 drift remains intact. The
cross-file consistency check on PRs is unchanged — all manifests must
still agree with mempalace/version.py whenever any version file moves.
Fails a tag push if `vX.Y.Z` does not match `mempalace/version.py` (the
single source of truth per CLAUDE.md), and fails PRs that touch any
version file without keeping all five in sync (pyproject.toml,
version.py, .claude-plugin/marketplace.json, .claude-plugin/plugin.json,
.codex-plugin/plugin.json).
Prevents the class of bug described in #874, where v3.1.0/v3.2.0/v3.3.0
tags all landed pointing at commits that still carried manifest version
3.0.14, blocking `/plugin update` for end users.
Refs #874
Aligns marketplace.json and both plugin.json files with version.py /
pyproject.toml (already at 3.3.0) so `/plugin update` reflects the
v3.1.0/v3.2.0/v3.3.0 tags that had been landing without manifest bumps.
Also updates marketplace.json `owner.url` from the stale
github.com/milla-jovovich path to the current github.com/MemPalace org.
Refs #874
sanitize_name rejects commas, colons, parentheses, and slashes — characters
that commonly appear in knowledge graph subject/object values. Adds
sanitize_kg_value for KG entity fields (subject, object, entity) while
keeping sanitize_name for predicates and wing/room names.
Noticed a URL
```
hXXps://www.mempalace[.]tech/
```
Though the README currently warns, it is perhaps best to surface it at urgency level at the top of the README.
- _count_human_messages() now logs a WARNING via _log() when a
non-empty transcript_path is rejected by the validator, making
silent auto-save failures diagnosable via hook.log
- Add test for platform-native paths (backslashes on Windows) to
verify _validate_transcript_path works cross-platform
- Add test verifying the warning log is emitted on rejection
Refs: MemPalace/mempalace#809
Prerequisite for RFC 001 (plugin spec, #743). Removes every direct
`import chromadb` outside the ChromaDB backend itself so the core
modules depend only on the backend abstraction layer.
Extends ChromaBackend with make_client, get_or_create_collection,
delete_collection, create_collection, and backend_version. Adds
update() to the BaseCollection contract. Non-backend callers
(mcp_server, dedup, repair, migrate, cli) now go through the
abstraction; tests patch ChromaBackend instead of chromadb.
With this landed, the RFC 001 spec can be enforced and PalaceStore
(#643) can ship as a plugin without touching core modules.