mempalace

Author	SHA1	Message	Date
Igor Lins e Silva	e334e257bf	fix(mcp): retry _get_collection once on transient failure (#1286 ) A transient chromadb exception inside `_get_collection` was swallowed by the bare `except Exception: return None`, leaving every subsequent tool call hitting the same poisoned cache silently. The fix wraps the body in a `for attempt in range(2)` loop: on attempt 0 failure, log via `logger.exception(...)` and clear `_client_cache` / `_collection_cache` / `_metadata_cache` so the next iteration forces `_get_client()` to rebuild from scratch — that path now re-runs `quarantine_stale_hnsw` (per #1322), so the second attempt heals the common stale-handle case automatically. If both attempts fail, return `None` (matches the prior contract for permanent failures). Two new tests in `tests/test_mcp_server.py::TestCacheInvalidation`: - `test_get_collection_retries_once_on_exception` — first attempt raises via a monkeypatched `_get_client`, second attempt succeeds; assert the caller gets the collection back, not None. - `test_get_collection_returns_none_after_two_failures` — both attempts fail, assert we exhaust the loop and return None (no infinite retry). Surgical extraction from PR #1286, which carried the same fix idea (plus a fork-sync bundle that couldn't be merged); credit to the original author below. Co-authored-by: Jeffrey Hein <jp@jphein.com>	2026-05-06 04:52:18 -03:00
Anthony Clendenen	ca5899e361	refactor: fix ruff bugbear and silent-except findings - B904: chain OSError/collection errors with "raise ... from e" in normalize.py and searcher.py so the original traceback is preserved. - B007: rename unused loop variables to _name in dedup, dialect, layers, and room_detector_local. - S110/S112: replace bare "try/except/pass" and "try/except/continue" with logger.debug(..., exc_info=True) in mcp_server, searcher, palace, palace_graph, miner, convo_miner, and fact_checker so background failures are observable without changing behaviour. A module-level logger ("mempalace_mcp", matching mcp_server/searcher) is added to the five files that didn't already have one. Configured ruff checks (E/F/W/C901) and ruff --select B, S110, S112 all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 04:12:09 -03:00
Igor Lins e Silva	8a9b2bed63	Merge pull request #988 from bobo-xxx/clawoss/fix/978-negative-similarity fix: clamp similarity scores to [0,1] to prevent negative values	2026-05-06 03:34:56 -03:00
Igor Lins e Silva	f3d9801c73	Merge pull request #1293 from hzx945627450-eng/fix/mcp-ensure-ascii fix: MCP server JSON output ensure_ascii=False for non-ASCII support	2026-05-06 03:34:32 -03:00
bobo-xxx	eef053d750	fix(mcp_server): clamp similarity to [0,1] to avoid negative values	2026-05-06 02:20:47 -03:00
Igor Lins e Silva	74288f1cdd	style: ruff format mcp_server.py (CI lint)	2026-05-06 02:20:00 -03:00
黄祖鑫(940219)	7b49478ef7	fix: MCP server JSON output ensure_ascii=False for non-ASCII support Without ensure_ascii=False, non-ASCII characters (e.g. Chinese) in tool results and JSON-RPC responses are escaped as \uXXXX, which causes downstream MCP clients to receive escaped text instead of the original characters. This affects all platforms, not just Windows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 02:20:00 -03:00
Igor Lins e Silva	869ab38095	style: ruff format mcp_server.py + test_mcp_server.py (CI lint)	2026-05-06 02:19:57 -03:00
Oleksii Pylypchuk	a85d432b54	feat: add validation for missing name parameter in tools/call requests	2026-05-06 02:19:57 -03:00
Oleksii Pylypchuk	55d79dc8cd	fix: include null id in JSON-RPC invalid request error responses and add validation tests	2026-05-06 02:19:57 -03:00
Oleksii Pylypchuk	0fdb480e12	fix(mcp): handle null JSON-RPC request payloads safely When the MCP client sends a malformed or null top-level request, prevent the AttributeError on request.get() by explicitly validating that the request is a dictionary. Returns standard JSON-RPC Error -32600 (Invalid Request) instead of crashing the server.	2026-05-06 02:19:57 -03:00
Igor Lins e Silva	67cda9d455	Merge pull request #1030 from eldar702/fix/none-metadata-residual-guards fix: guard None metadata/doc in tool_check_duplicate and Layer1/Layer2	2026-05-06 01:51:24 -03:00
Igor Lins e Silva	0c8314f919	Merge pull request #1060 from alonehobo/fix/stdio-utf8 fix(mcp): force UTF-8 on stdio to fix -32000 on non-ASCII payloads (Windows)	2026-05-06 01:49:56 -03:00
Igor Lins e Silva	642a073305	Merge pull request #1114 from Sathvik-1007/fix/list-drawers-pagination-total fix: add total count to tool_list_drawers pagination response	2026-05-06 01:49:37 -03:00
Igor Lins e Silva	ea6f2c0c4c	Merge pull request #1162 from imtylervo/fix/palace-write-lock-queue-pattern fix: serialize ChromaCollection writes through palace lock	2026-05-06 01:48:51 -03:00
Igor Lins e Silva	53675dd194	Merge pull request #1160 from mvalentsev/fix/mcp-kg-lazy-per-path-cache fix(mcp): lazy per-path KnowledgeGraph cache (#1136)	2026-05-06 01:33:47 -03:00
Igor Lins e Silva	7ede231da9	Merge pull request #1167 from arnoldwender/fix/kg-date-validation fix(kg): validate ISO-8601 date formats at MCP boundary	2026-05-06 01:33:27 -03:00
mvalentsev	b8816e0fe2	fix(mcp): retry KG handlers once on concurrent close race Race scenario: a KG tool handler calls _get_kg() and gets a live KnowledgeGraph; another thread fires tool_reconnect() between that return and the handler's kg.add_triple()/kg.query_entity()/etc call. tool_reconnect drains _kg_by_path and closes the underlying sqlite3.Connection; the handler then raises sqlite3.ProgrammingError: 'Cannot operate on a closed database', which surfaces as a -32000 to the MCP client even though the user just asked for a reconnect. New _call_kg(op) helper wraps each handler's kg call in a one-shot retry: catch exactly sqlite3.ProgrammingError, evict the stale entry (only if the cache slot still points at the closed instance — another thread may have already replaced it), and rerun op against a fresh _get_kg(). Beyond one retry give up so a sustained close-stream surfaces clearly instead of looping. All five KG handlers (tool_kg_query, tool_kg_add, tool_kg_invalidate, tool_kg_timeline, tool_kg_stats) now route through _call_kg. Tests pin the contract: * retries with a fresh KG and returns the second result * non-ProgrammingError exceptions propagate without retry * gives up after exactly one retry on sustained close	2026-05-03 21:43:51 +05:00
mvalentsev	0a62658051	fix(mcp): drain KG cache on tool_reconnect tool_reconnect cleared ChromaDB caches but left _kg_by_path entries intact. After an external replacement of knowledge_graph.sqlite3 the server kept serving the old open sqlite3.Connection, returning stale results. Now iterate _kg_by_path under _kg_cache_lock, call close() best-effort, and clear the dict so the next tool call reopens the KG from disk. Two new tests in TestKGLazyCache verify cache invalidation and that a failing close() does not block the clear.	2026-05-03 17:43:00 +05:00
mvalentsev	19f8a4ff68	style(mcp): drop issue-tracker comments from KG cache block Inline comments referencing #1136 and #540 add no information the identifiers do not already convey. PR description carries the context; code stays quiet.	2026-05-03 17:43:00 +05:00
mvalentsev	beac5d9954	refactor(mcp): replace eager _kg with lazy per-path cache (#1136 ) Swap the module-level KnowledgeGraph singleton for a lazy, per-path cache keyed by the resolved sqlite path. Import no longer creates a sqlite file as a side effect, and MCP servers started with --palace now route KG calls to the correct tenant when MEMPALACE_PALACE_PATH changes between calls, matching the per-call behavior of _get_client() on the ChromaDB side. Default-path behavior is preserved: without --palace at startup, KG stays on DEFAULT_KG_PATH regardless of env var. The "no --palace but env var set" case is #540's scope and is not changed here.	2026-05-03 17:43:00 +05:00
Igor Lins e Silva	5380189f82	Merge pull request #1320 from MemPalace/fix/1314-kg-temporal-params fix(mcp): forward valid_to and source params in kg_add/kg_invalidate (#1314)	2026-05-03 03:51:29 -03:00
Igor Lins e Silva	0e65c54978	docs(mcp): drop §5.5 from kg_add docstring/schema The repo's anti-jargon meta-test bans §N markers outside the sources/backends allowlist. mcp_server.py isn't allowlisted, so the "RFC 002 §5.5" references added in this PR turned the test red. Trim to "RFC 002" — section number isn't load-bearing for the description.	2026-05-03 06:28:12 -03:00
Igor Lins e Silva	3e6f6480c0	Merge pull request #1325 from MemPalace/security/mcp-omit-absolute-paths fix(mcp): omit absolute filesystem paths from MCP tool responses	2026-05-03 03:20:11 -03:00
Igor Lins e Silva	7fc260f752	fix(mcp): basename source_file in tool_get_drawer responses The MCP `mempalace_get_drawer` tool returned the entire raw drawer metadata blob to any connected client, and the `source_file` field in that blob is the absolute filesystem path written by the miners (`miner.py`, `convo_miner.py` — `source_file = str(filepath)`). On a single-user local deployment this is self-disclosure, but in nested-agent or multi-server MCP topologies the client is a separate trust domain and the host's directory layout has no documented client-side use. Mirror the mitigation that `searcher.search_memories()` already applies on its own return path: reduce `source_file` to its basename via `Path(source_file).name` before handing the metadata to the client. Citations still work — the directory layout does not leak. Companion to #1 (omit palace_path from tool_status). Same threat class, different surface: - mempalace_status — palace dir path → fixed in #1 - mempalace_get_drawer — per-drawer source_file path → this PR Other read tools were audited and do not leak host paths: - mempalace_search — already basenames source_file - mempalace_list_drawers — returns wing/room/preview only - mempalace_diary_read — date/timestamp/topic/content only - mempalace_reconnect — success/message/drawers only - mempalace_kg_* — entity/predicate strings, counts - mempalace_check_duplicate — wing/room/preview only Changes: - mempalace/mcp_server.py: tool_get_drawer() now basenames metadata.source_file - tests/test_mcp_server.py: regression test asserting the absolute path and its parent directory do not appear anywhere in the response - website/reference/mcp-tools.md: clarify the documented return shape	2026-05-03 05:58:46 -03:00
icciAaron	b2f259c253	fix(mcp): omit palace_path from tool_status responses (+ docs) The MCP `mempalace_status` tool was returning the server's absolute `_config.palace_path` to any connected client on both the main (ChromaDB-backed) path and the sqlite fallback path that runs when HNSW divergence is detected (#1222). On a single-user local deployment this is self-disclosure, but in nested-agent or multi-server MCP topologies the client is a separate trust domain and the absolute path has no documented client-side use. Clients that legitimately need the palace path continue to have three documented channels: the `MEMPALACE_PALACE_PATH` env var (primary) or its legacy `MEMPAL_PALACE_PATH` alias, the `~/.mempalace/config.json` file, and the `--palace` CLI flag on most subcommands. Also corrects stale docs that claimed `mempalace_reconnect` returned a `palace_path` field; the code returns `{success, message, drawers, vector_disabled[, vector_disabled_reason]}` on success, plus a no-palace shape and an exception shape. - mempalace/mcp_server.py: drop palace_path from tool_status() and _tool_status_via_sqlite() result dicts - website/reference/mcp-tools.md: update documented return shapes for mempalace_status (fix) and mempalace_reconnect (stale-docs correction) Authored-by: Aaron Salsitz (ICCI LLC, @icciaaron). Claude Code was used as an authoring and review-orchestration tool, with human-in-the-loop oversight at every step: Aaron wrote the prompts, reviewed each draft, called for three independent review passes (drafting / post-rebase technical / CISA-aligned disclosure-leak), and verified the final patch behavior before commit.	2026-05-03 05:58:46 -03:00
Igor Lins e Silva	e9222b4c7b	fix(mcp): case-insensitive agent name in diary_write/diary_read (#1243 ) `tool_diary_write` stored the `agent` metadata verbatim after `sanitize_name` (which preserves case), while `tool_diary_read` filtered by exact match — so writing as "Claude" and reading as "claude" silently returned zero rows. Both endpoints now lowercase `agent_name` immediately after sanitization. The default per-agent wing slug is also stable across casings since it's derived from the same normalized form. Behavior change: entries written prior to this fix under mixed-case agent names will not match the new lowercase filter; documented under v3.3.5 in CHANGELOG with a `mempalace repair` pointer. Adds a regression test (`test_diary_read_case_insensitive_agent`) and updates the existing `test_diary_write_and_read` to assert the new lowercase agent identity. Closes #1243 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:57:09 -03:00
Igor Lins e Silva	e4e25ed186	fix(mcp): forward valid_to and source params in kg_add/kg_invalidate (#1314 ) `tool_kg_add` previously accepted only `valid_from` and `source_closet`, silently dropping `valid_to`, `source_file`, and `source_drawer_id` at the MCP boundary. Backfilling already-ended historical facts therefore collapsed to "still current," and adapter provenance never reached the SQLite layer even though `KnowledgeGraph.add_triple` already supported every column. `tool_kg_invalidate` returned the literal string `"today"` whenever the caller omitted `ended`, hiding the actual stamped date from anyone trying to verify what got persisted. Changes: - Extend `tool_kg_add` signature + MCP input_schema with `valid_to`, `source_file`, `source_drawer_id`; forward all of them to `_kg.add_triple` and to the WAL log. - Resolve `ended` to `date.today().isoformat()` in `tool_kg_invalidate` before logging / returning, so the response always reports the actual date stored in `valid_to`. - Add regression tests for valid_to round-trip, source_file / source_drawer_id provenance, and the resolved-ended-date contract. - Leave TODO(#1283) markers so the open ISO-8601 validation PR can drop `validate_iso_date` over `valid_from` / `valid_to` / `ended` cleanly. The underlying `KnowledgeGraph.add_triple` already accepted these kwargs (RFC 002 §5.5) — only the MCP edge needed wiring up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:54:32 -03:00
Igor Lins e Silva	cd98d6674e	fix(mcp_server): address copilot review on #1303 - Resolve the EF inside the two reopen branches that actually call `client.get_collection` / `client.create_collection`, so warm-cache reads stay zero-cost (no `MempalaceConfig()` / `_resolve_providers` on every tool call). - Reuse `ChromaBackend._resolve_embedding_function()` instead of duplicating its try/except + log message + None-fallback. - Reword the inline + CHANGELOG explanation to clarify that ChromaDB 1.x persists the EF identity (its `name()`) but not the instance/ configuration — `mempalace.embedding` documents this and spoofs `name()` to `"default"` precisely so the identity check passes; the bug was the provider list (lazy ONNX selection) silently differing.	2026-05-01 19:46:59 -03:00
Igor Lins e Silva	ac6c2b6af6	fix(mcp_server): pass embedding_function= on collection reopen (#1299 ) `mcp_server._get_collection` bypassed `ChromaBackend.get_collection` and called `client.get_collection` / `client.create_collection` without `embedding_function=`. ChromaDB 1.x does not persist the EF identity with the collection, so the MCP server's reopen silently bound chromadb's built-in `DefaultEmbeddingFunction` while the miner / Stop hook ingest path bound `mempalace.embedding.get_embedding_function()`. On bleeding-edge interpreters (python 3.14 + chromadb 1.5.x on Apple Silicon, per #1299) the default EF's lazy ONNX provider selection could SIGSEGV the host process on first `col.add()`, killing the MCP stdio server and leaving every subsequent tool call returning `Connection closed` until Claude Code was relaunched. Reads worked because `col.get(ids=...)` and metadata fetches don't invoke the EF; the auto-ingest path worked because mining routes through the backend abstraction. Diary writes were the consistent failure surface. Resolve the EF up front (matching `ChromaBackend._resolve_embedding_function`) and pass it into both reopen branches. Falls back to the chromadb default only if `mempalace.embedding.get_embedding_function` itself raises. Regression test patches the chromadb client class to capture `embedding_function=` on every `get_collection` / `create_collection` call from `_get_collection(create=True)` and `_get_collection()`, and fails if any call omits it. Follow-up to #1262 / #1289 (which fixed the metadata-mismatch SIGSEGV path); this addresses the EF-mismatch SIGSEGV path on the same surface.	2026-05-01 19:34:38 -03:00
Igor Lins e Silva	9dd56ecb0a	fix(mcp_server): split get_or_create_collection on reopen (follow-up to #1262 ) #1262 split `get_or_create_collection` into `get_collection` + fallback `create_collection` inside `ChromaBackend.get_collection`, fixing the chromadb 1.5.x Rust-binding SIGSEGV that fires when stored collection metadata differs from the call-site's `_HNSW_BLOAT_GUARD` payload. The MCP server's `_get_collection(create=True)` carries the same metadata payload at `mcp_server.py:287` and routes through chromadb's Python client directly, bypassing the backend layer. Both `tool_add_drawer` and `tool_diary_write` reach this site on every invocation, and the Stop hook fires `mempalace_diary_write` at session end — which was exactly the crash path #1089 named. Apply the same try/except split here so legacy palaces whose stored metadata predates the bloat-guard expansion no longer crash on the MCP-server reopen path. Regression test patches `get_or_create_collection` at the chromadb client class level (not the instance — chromadb's mtime-change detection rebuilds the client between calls, so an instance-level spy doesn't survive) and asserts the second `_get_collection(create=True)` call never reaches it.	2026-04-30 22:35:18 -03:00
Arnold Wender	4d98b05240	fix(kg): validate ISO-8601 date formats at MCP boundary tool_kg_query (as_of), tool_kg_add (valid_from), and tool_kg_invalidate (ended) accepted any string and forwarded it to SQLite without format validation. Parameterized queries prevent SQL injection, but invalid date strings silently produce empty result sets — callers cannot distinguish "no fact at this time" from "your date format was unrecognized." This is especially painful for natural-language LLM callers that synthesize dates like "March 2026" or "Jan 2025". Add sanitize_iso_date() in config.py alongside the other input validators. It accepts YYYY, YYYY-MM, and YYYY-MM-DD forms; passes through None/empty; and raises ValueError with a field-named message on anything else. Call it from the three kg MCP tool wrappers before values reach the storage layer so the caller gets a clear error instead of a silent miss. Closes #1164	2026-04-30 15:21:17 +02:00
Fergus Ching	88a53b2ffa	fix: prevent HNSW index bloat via batch_size + sync_threshold metadata Sets `hnsw:batch_size` and `hnsw:sync_threshold` to 50_000 at every collection-creation call site: * `mempalace/backends/chroma.py` — `get_collection(create=True)` and the legacy `create_collection()` path. Preserves existing `hnsw:space`, `hnsw:num_threads=1` (race fix from #976), and `*ef_kwargs` (embedding-function plumbing from `a4868a3`). `mempalace/mcp_server.py` — the direct `client.get_or_create_collection` path used when a palace is first opened by the MCP server. Without this third site, MCP-bootstrapped palaces would skip the guard and could still trigger the original bloat. Without these defaults, mining ~10K+ drawers triggers ~30 HNSW index resizes and hundreds of persistDirty() calls. persistDirty uses relative seek positioning in link_lists.bin; accumulated seek drift across resize cycles causes the OS to extend the sparse file with zero-filled regions, each cycle compounding the next. Result: link_lists.bin grows into hundreds of GB sparse, after which `status`, `search`, and `repair` all segfault and the palace is unrecoverable. Empirical: rebuilt a palace from scratch on 39,792 drawers across 5 wings with this fix applied. Final palace 376 MB, link_lists.bin stays at 0 bytes across both Chroma collection dirs, status and search both return cleanly. Same workload without the fix bloated the palace to 565 GB sparse (30 GB on disk) and segfaulted at ~15K drawers. Migration note: chromadb 1.5.x exposes a `collection.modify(configuration={"hnsw": {...}})` retrofit path for already-created collections (`UpdateHNSWConfiguration`), but this PR doesn't pursue it — by the time link_lists.bin has bloated the index is already corrupt and the only known recovery is a fresh mine. Tests assert both keys land on the persisted collection metadata in both `ChromaBackend` code paths, which also covers the #1161 "config silently dropped" concern at CI time. A separate smoke test was used to verify the metadata round-trips through `chromadb.PersistentClient` reopen on chromadb 1.5.8. Closes #344 Supersedes #346 Co-authored-by: robot-rocket-science <robot-rocket-science@users.noreply.github.com>	2026-04-27 02:54:19 -03:00
imtylervo	f30fdf2672	fix: serialize ChromaCollection writes through palace lock #976 protects `mempalace mine`, but MCP/direct backend writers still call ChromaCollection.add/upsert/update/delete without the palace lock. This moves the lock boundary to the Chroma backend seam so all Chroma writes share the same palace-level serialization, with a re-entrant guard for miner paths that already hold the lock. mine_palace_lock(palace_path) gains a per-thread re-entrant guard (threading.local + pid-tag against fork inheritance) so ChromaCollection write methods can take the lock without self-deadlocking when called from inside miner.mine()'s outer hold. ChromaCollection.__init__ accepts an optional palace_path; when set, add/upsert/update/delete wrap their underlying chromadb call with mine_palace_lock(palace_path). palace_path=None preserves the legacy no-lock behaviour for direct callers and tests. ChromaBackend's get_collection/create_collection pass palace_path through; mcp_server._get_collection forwards _config.palace_path so all MCP write tools inherit the wrapping. Tests: 5 new in tests/test_chroma_collection_lock.py covering opt-in, writer-blocks-during-mine, re-entrant-inside-mine, two-process serialization, and a source-level read-path-not-locked pin. Plus 1 new + 1 rewritten in tests/test_palace_locks.py for the re-entrant semantics. 52 passed in 1.01s including the existing test_backends.py regression suite. Refs #1161.	2026-04-27 14:16:20 +10:00
Igor Lins e Silva	57ac669dbc	fix(repair): address Copilot review on #1227 Five Copilot review issues + the Python 3.9 CI failure rolled into one follow-up: * Replace ``dict \| None`` annotated assignment with a type-comment so module load doesn't evaluate PEP 604 syntax on Python 3.9 (CI red). * Drop ``mempalace repair rebuild`` — the CLI only ships ``mempalace repair`` (rebuild) and ``mempalace repair-status``. Updated all user-facing messages, docstrings, and test assertions. * Replace ``_get_client()`` in ``tool_search`` with the safe ``_refresh_vector_disabled_flag`` probe so the fallback isn't defeated by the very chromadb client load it's trying to avoid. * Short-circuit ``tool_status`` to a pure-sqlite reader (``_tool_status_via_sqlite``) when divergence is detected so wing / room counts come back without ever opening the persistent client. * Wrap the recency-window query in ``_bm25_only_via_sqlite`` with an ``id``-ordered fallback so legacy schemas missing ``created_at`` don't break BM25 search. New test covers the sqlite-status short-circuit. 1409 tests pass.	2026-04-26 21:53:56 -03:00
Igor Lins e Silva	0d349c3d86	fix(repair): detect HNSW capacity divergence and fall back to BM25 (#1222 ) When chromadb's HNSW segment freezes at a stale max_elements while sqlite keeps accumulating embeddings, the next chromadb open segfaults the MCP server on every tool call. Adds a pure-filesystem capacity probe (zero chromadb interaction), a `mempalace repair-status` read-only health check, and a BM25-only sqlite fallback so the palace stays reachable even when vector search is unavailable. * `hnsw_capacity_status` reads sqlite + index_metadata.pickle directly via a tight-allowlist unpickler — no hnswlib import, no segment load. * MCP server runs the probe at startup and after every reconnect; sets `_vector_disabled` and routes search to the sqlite FTS5 + BM25 path. * `tool_status` and `tool_reconnect` surface the fallback state. * Threshold tuned for chromadb 1.5.x async-flush lag (2× sync_threshold).	2026-04-26 19:54:00 -03:00
Igor Lins e Silva	5de5b0923d	Merge pull request #936 from shaun0927/fix/diary-topic-sanitize fix: sanitize topic parameter in tool_diary_write	2026-04-26 16:12:37 -03:00
Felipe Truman	8df944a54d	fix: best-effort HNSW thread-pin retrofit + drop dead attempt-cap constant Addresses remaining PR #976 review items after rebase on develop. `get_collection(create=False)` previously returned existing collections without re-applying `hnsw:num_threads=1`, so palaces created before the fix kept the unsafe parallel-insert path. Add `_pin_hnsw_threads()` helper that calls `collection.modify(configuration=UpdateCollectionConfiguration( hnsw=UpdateHNSWConfiguration(num_threads=1)))` best-effort on every `get_collection` call (including the MCP server's `_get_collection`). In chromadb 1.5.x the runtime config does not persist to disk across `PersistentClient` reopens, so the retrofit is re-applied each process start rather than being a one-shot migration. Fresh palaces keep the metadata-based pin as primary defense; legacy palaces now also get per-session protection without requiring `mempalace nuke` + re-mine. After the rebase on develop, `hook_precompact` delegates to `_mine_sync` and no longer emits `decision: block`, so the attempt-cap constant was orphaned. Grep confirms 0 usages in the repo — remove it. - `_pin_hnsw_threads` retrofits legacy collection (num_threads None -> 1) - `_pin_hnsw_threads` swallows all errors (never raises) - `ChromaBackend.get_collection(create=False)` applies retrofit on legacy palace - 62 tests pass (10 backends + 6 palace locks + 46 hooks_cli)	2026-04-25 04:36:29 -03:00
Felipe Truman	99b820cb42	fix: address PR review — per-palace lock, MCP server path, hook timeout, tests Addresses the six Copilot review comments on the initial commit. 1) #6 (critical) — mcp_server.py `_get_collection` bypassed ChromaBackend The MCP server creates its palace collection directly via `chromadb.PersistentClient.get_or_create_collection` in `_get_collection`, not through `ChromaBackend.get_collection`. That path was missing the `hnsw:num_threads=1` metadata, so the primary crash surface for #974 and #965 was untouched by the original patch. Fixed by passing `hnsw:num_threads=1` at the mcp_server create site too. Documented in a code comment that the setting is only honored at creation time — existing palaces created before this fix still need a `mempalace nuke` + re-mine to gain the protection. 2) #3 — mine_global_lock over-serialized mines across unrelated palaces Replaced the single global lock file `mine_global.lock` with a per-palace lock keyed by `sha256(os.path.abspath(palace_path))` (`mine_palace_<hash>.lock`). Mines against the same palace still collapse to a single runner (the correctness boundary), but mines against different palaces are now free to run in parallel. `mine_global_lock` is kept as a backward-compatible alias for `mine_palace_lock` so any external callers that imported the previous name keep working. 3) #1 — hook_precompact swallowed OSError but not subprocess.TimeoutExpired `subprocess.run(..., timeout=60)` raises `TimeoutExpired` on slow palaces. The previous `except OSError` clause didn't catch it, so the hook could raise and fail to emit any JSON decision — leaving the harness without a block/passthrough signal. Fixed by catching `(OSError, subprocess.TimeoutExpired)` together and always falling through to the block decision so the hook reliably emits a response. 4) #2 + #4 — tests - tests/test_hooks_cli.py: added `test_precompact_first_two_attempts_block`, `test_precompact_passes_through_after_cap`, and `test_precompact_counter_is_per_session` to lock in the #955 deadlock fix. - tests/test_palace_locks.py (new): covers `mine_palace_lock` single-acquire, reuse-after-release, cross-process serialization on the same palace, non-interference across different palaces, path normalization, and the `mine_global_lock` back-compat alias. 5) #5 — known limitation, documented but not auto-fixed Copilot suggested detecting collections missing `hnsw:num_threads=1` and calling `collection.modify(metadata=...)` to retrofit existing palaces. Verified against chromadb 1.5.7: `modify(metadata=...)` replaces metadata rather than merging, and re-passing `hnsw:space="cosine"` then raises `ValueError: Changing the distance function of a collection once it is created is not supported currently.` The HNSW runtime configuration (`configuration_json`) also does not expose `num_threads` in chromadb 1.5.x, so the flag appears to be read only at creation time. Rather than paper over the limitation with a best-effort `modify` that silently drops `hnsw:space`, documented in the mcp_server comment that pre-existing palaces need a `mempalace nuke` + re-mine to gain the protection. Fresh palaces are always protected. Testing - pytest tests/test_palace_locks.py tests/test_hooks_cli.py tests/test_backends.py tests/test_cli.py → 98 passed, 0 failed. - Runtime validation with two concurrent `mempalace mine` calls: - Different palaces → both complete in parallel ✓ - Same palace → one completes, the other exits with "another `mine` is already running against <palace> — exiting cleanly." ✓	2026-04-25 04:34:30 -03:00
Igor Lins e Silva	1fd16daac2	fix(mcp): diary_read(wing='') spans all wings for agent (#1145 ) #1097 fixed mempalace_search to treat empty-string wing/room as no filter, matching how LLM agents default to filling every optional parameter with ''. The same pattern wasn't applied to diary_read: passing wing='' defaulted to wing_<agent_name>, siloing away entries that hooks had written to project-derived wings per #659. When wing is empty/omitted, filter only on agent + room=diary so callers get a unified view of the agent's journal across every wing it has written to. Explicit wing=<name> continues to scope reads to that wing only. Adds test covering empty-wing read after writing to both the default and a non-default wing.	2026-04-23 23:39:34 -03:00
Kunal Garhewal	9947ad06df	fix: treat empty string as no filter in mempalace_search wing/room (#1097 ) * fix: treat empty string as no filter in mempalace_search wing/room * fix: also treat whitespace-only strings as no filter	2026-04-23 15:19:18 -07:00
Jeffrey Hein	df3ee289fc	fix: add wing param to diary_write/diary_read, derive from transcript path (#659 ) * fix: add wing param to diary_write/diary_read, derive from transcript path Without a wing override, all diary entries from the stop hook land in wing_session-hook regardless of which project the session is in, making per-project diary search impossible. - tool_diary_write(): add optional `wing` param; sanitize and use it when provided, fall back to wing_{agent_name} when omitted - tool_diary_read(): add optional `wing` param for filtering by target wing - TOOLS dict: expose `wing` in input_schema for both diary tools - hooks_cli: add _wing_from_transcript_path() helper that extracts the project name from Claude Code paths like ~/.claude/projects/-home-jp-Projects-kiyo-xhci-fix/... → kiyo-xhci-fix - hook_stop: derive project wing and append wing= hint to block reason so Claude writes diary entries to the correct per-project wing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: sanitize wing param, cross-platform paths, tighten test assertions Addresses Copilot review feedback on #659. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wing_ prefix + agent filter on diary_read Addresses bensig's 2-issue review on this PR. 1. _wing_from_transcript_path() was returning bare project names (e.g. "myproject") while all existing wings follow the wing_* convention from AAAK_SPEC. Entries landed in wing="myproject" while diary_read defaulted to wing="wing_<agent_name>" — orphaning every diary entry written by the stop hook. Now returns "wing_<project>" and falls back to "wing_sessions". 2. tool_diary_read() did not include agent_name in the ChromaDB where filter when a custom wing was provided — any caller with a shared wing could read entries written by other agents. Add {"agent": agent_name} to the $and clause. Also flagged by Qudo and left unresolved until now. Tests updated to expect the wing_ prefix (6 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 15:07:25 -07:00
Sathvik-1007	c2e053176c	fix: add total count to tool_list_drawers pagination response The list_drawers response only included count (current page size) with no total field, making it impossible for callers to know when pagination is exhausted. A page returning count == limit is ambiguous — it could be the last exact-fit page or there could be more results. Add a total field that reports the full number of matching drawers. For unfiltered requests this uses col.count(); for filtered requests (wing/room) it uses a lightweight col.get(include=[]) to count matching IDs without fetching documents.	2026-04-23 01:40:38 +05:30
alonehobo	35b033d77f	fix(mcp): force UTF-8 on stdio to fix -32000 on non-ASCII payloads On Windows, Python defaults sys.stdin/sys.stdout to the system codepage (e.g. cp1251 on Russian locales, cp1252 on Western European), while MCP JSON-RPC is always UTF-8. Non-ASCII payloads (Cyrillic, CJK, accented European) get mis-decoded before reaching handlers, causing json.loads to fail or tool handlers to receive garbled strings. Both surface to the client as a generic MCP error -32000. Reproduction: 1. On Windows with a non-Latin locale, call mempalace_add_drawer or mempalace_kg_add with Cyrillic/CJK in content or KG object. 2. Server returns: MCP error -32000: Internal tool error. 3. Calling the handler directly from Python works fine -- the bug is purely in the stdio transport. Fix: Reconfigure stdin/stdout to UTF-8 at the start of main(), after _restore_stdout(). Uses errors="replace" defensively so a lone bad byte cannot take down the server. Guarded by hasattr(reconfigure) for exotic stream replacements. This matches the behaviour of PYTHONUTF8=1 / python -X utf8 without requiring users to set an env var.	2026-04-21 12:33:58 +05:00
Pim Messelink	9f5b8f5fd6	fix: add mempalace-mcp console entry point for pipx/uv compatibility The MCP server config used `python -m mempalace.mcp_server` which fails when mempalace is installed via pipx or uv, since the system python cannot find the module in the isolated venv. Adding a `mempalace-mcp` console_scripts entry point ensures the MCP server works regardless of installation method (pip, pipx, uv, conda). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 01:26:00 -03:00
eldar702	4949aab68b	fix: guard None metadata/doc in tool_check_duplicate and Layer1/Layer2 Chroma 1.5.x can return ``None`` inside the ``metadatas`` / ``documents`` lists of a query/get result for partially-flushed rows. The codebase already has a systemic None-guard pattern (merged #999, #1013, #1019) but three call sites were still unguarded: * ``mcp_server.tool_check_duplicate`` (``mcp_server.py:487-488``) — ``meta = results["metadatas"][0][i]`` followed by ``meta.get(...)`` raises ``AttributeError: 'NoneType' object has no attribute 'get'``. The broad ``except Exception`` wrapper (line 504) swallows it and returns an uninformative ``"Duplicate check failed"``. * ``layers.Layer1.generate`` (``layers.py:126``) — iterates ``zip(docs, metas)`` and calls ``meta.get(key)`` in the importance loop. A single None metadata blows up the entire wake-up render. * ``layers.Layer2.retrieve`` (``layers.py:224``) — same pattern, same crash path for the on-demand render. Apply the same ``meta = meta or {}`` / ``doc = doc or ""`` idiom used by the merged guards in the search path. Three-line additions, no behaviour change on well-formed results. Tests added: * ``test_check_duplicate_handles_none_metadata`` — mocks the collection query to return ``None`` for one metadata and document, asserts the call does not crash and the sentinel-rendered entry has wing/room "?" and empty content. * ``test_layer1_handles_none_metadata`` / ``_handles_none_document`` * ``test_layer2_handles_none_metadata`` Relationship to other open PRs: * #1019 guarded ``searcher.py`` loops. This PR extends the same guard to the three call sites #1019 did not touch. * #979 fixed ``tool_check_duplicate`` negative similarity but left the None-metadata path unguarded. * Does not overlap #1013 (``Layer3.search_raw``) or #999.	2026-04-19 11:13:50 +03:00
jp	3f0cfd5ed4	fix(mcp): guard tool_status/list_wings/list_rooms/get_taxonomy against None metadata Four more MCP handlers iterate a metadata list and call m.get(...) unconditionally. When the cache contains a None entry (drawers with no metadata, common on older mining paths), the try block catches the AttributeError and marks the response "partial: true" with an error message — visible as {"error": "'NoneType' object has no attribute 'get'", "partial": true} returned from mempalace_status even though the palace data is otherwise fetchable. Same m = m or {} guard we applied to searcher.py (d3a2d22, a51c3c2) and miner.status() (66f08a1). None-metadata drawers now roll up under the existing "unknown" fallback bucket instead of poisoning the response with a misleading partial flag. Regression test: mock the metadata cache with a None in the middle, assert tool_status returns clean counts and no error/partial fields. Verified the test fails without the guard. 998 tests pass.	2026-04-18 12:38:23 -07:00
JunghwanNA	5bf826046c	fix: sanitize topic parameter in tool_diary_write agent_name and entry are validated via sanitize_name/sanitize_content, but topic is stored raw into ChromaDB metadata. Apply the same sanitize_name guard to reject null bytes, path traversal, and oversized payloads.	2026-04-16 12:12:17 +09:00
Marcio E. Heiderscheidt	b524b31839	fix: restrict file permissions on sensitive palace data (#814 ) * fix: restrict file permissions on sensitive palace data On Linux with default umask (022), several files and directories containing personal data were created world-readable. This patch applies chmod 0o700 to directories and 0o600 to files immediately after creation, wrapped in try/except for Windows compatibility. Files hardened: - hooks_cli.py: hook_state/ directory and hook.log - entity_registry.py: entity_registry.json (names, relationships) - knowledge_graph.py: knowledge_graph.sqlite3 parent directory - exporter.py: export output directory and wing subdirectories - config.py: people_map.json (name mappings) - mcp_server.py: WAL file creation uses atomic os.open (TOCTOU fix) Refs: MemPalace/mempalace#809 * fix: avoid redundant chmod calls on hot paths - hooks_cli.py: chmod STATE_DIR and hook.log only on first creation, not on every _log() call (hooks fire on every Stop event) - exporter.py: track created wing dirs to skip redundant makedirs + chmod on the same directory across batches - mcp_server.py: remove redundant _WAL_FILE.chmod after os.open already set mode=0o600 atomically Refs: MemPalace/mempalace#809	2026-04-15 00:27:03 -07:00
Arnold Wender	b226251ddf	fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225 ) (#864 ) * fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225) Fixes #225. Several transitive dependencies (chromadb, onnxruntime, posthog) print banners and warnings to stdout — sometimes at the C level — during the mcp_server import chain. Because the MCP protocol multiplexes JSON-RPC over stdio, any non-JSON output on stdout corrupted the message stream and broke Claude Desktop's parser with errors like: MCP mempalace: Unexpected token '', "********"... is not valid JSON MCP mempalace: Unexpected token 'E', "EP Error D"... is not valid JSON MCP mempalace: Unexpected token 'F', "Falling ba"... is not valid JSON Reproduced on Windows 11 with mempalace 3.0.0 / Python 3.10 / Claude Desktop 1.1062.0. Fix: at module load, redirect stdout to stderr at both the Python level (sys.stdout = sys.stderr) and the file-descriptor level (os.dup2(2, 1)) to catch C-level prints, while preserving the real stdout for later restore. main() calls _restore_stdout() right before entering the protocol loop so JSON-RPC responses still go to the real stdout. Adds tests/test_mcp_stdio_protection.py with three regression tests: - module-level redirect is in place after import - _restore_stdout() restores the original stdout (idempotent) - 'python -m mempalace.mcp_server' with empty stdin emits no stdout style: reformat with ruff 0.4 (CI version) for #225	2026-04-15 00:26:51 -07:00

1 2 3

107 Commits