mempalace

Author	SHA1	Message	Date
eldar702	5347c2c71c	fix(searcher): clamp effective_distance to valid cosine range [0, 2] ``search_memories`` computes ``effective_dist = dist - boost`` where ``boost`` can be as large as ``CLOSET_RANK_BOOSTS[0] == 0.40`` for a rank-0 closet hit. When the raw drawer distance is small — any near-exact match — the subtraction goes negative. Two downstream effects: 1. Line 418 returns ``round(max(0.0, 1 - effective_dist), 3)`` as ``similarity``. With ``effective_dist = -0.30`` that yields ``similarity = 1.30``, outside the documented ``[0, 1]`` range. The ``max(0.0, ...)`` only prevents negative similarities; it does not cap above 1. 2. Line 427 stores ``_sort_key: effective_dist`` and line 435 sorts ``scored`` ascending by that key. A negative key drops below the rest, so the strongest hybrid matches end up sorting after weaker ones — ranking inversion under the exact conditions hybrid retrieval is supposed to serve best. Clamp ``effective_dist`` to the valid cosine-distance range ``[0, 2]``. The boost still wins (closet-backed hit still ranks first), it just no longer flips the order. Test added: mock drawer_col (base dist 0.08 / 0.35 for two sources) + closet_col (rank-0 closet for the 0.08 source) → assert all hits have ``0 <= similarity <= 1`` and ``0 <= effective_distance <= 2``, and that the closet-boosted source still ranks first. Relationship to other PRs: * #988 clamps the output ``similarity`` alone. That does not fix the sort-key inversion or the invalid ``effective_distance`` in the returned dict. This PR clamps at the arithmetic source so both downstream users of the value stay in range. * Orthogonal to #979 (``tool_check_duplicate`` negative similarity).	2026-05-06 02:19:54 -03:00
Igor Lins e Silva	f854da779f	fix(lint): hoist hooks_cli_mod import to top of test_hooks_cli (E402) The alias was placed below an explanatory comment block introduced by #1305, which trips ruff E402 (module-level import not at top of file). Moved next to the existing 'from mempalace.hooks_cli import (...)' line. CI lint went red on develop after #1305 merged with the failing check; this re-greens it so subsequent PRs do not inherit the failure.	2026-05-06 01:57:44 -03:00
Igor Lins e Silva	67cda9d455	Merge pull request #1030 from eldar702/fix/none-metadata-residual-guards fix: guard None metadata/doc in tool_check_duplicate and Layer1/Layer2	2026-05-06 01:51:24 -03:00
Igor Lins e Silva	d9ab5b7fd3	Merge pull request #1305 from lcatlett/upstream/respect-absent-palace-dir fix(hooks): treat absent ~/.mempalace as auto-save off	2026-05-06 01:49:22 -03:00
Igor Lins e Silva	ea6f2c0c4c	Merge pull request #1162 from imtylervo/fix/palace-write-lock-queue-pattern fix: serialize ChromaCollection writes through palace lock	2026-05-06 01:48:51 -03:00
Igor Lins e Silva	d1e27b8c42	style: ruff format new test files (CI lint)	2026-05-06 01:47:46 -03:00
Igor Lins e Silva	53675dd194	Merge pull request #1160 from mvalentsev/fix/mcp-kg-lazy-per-path-cache fix(mcp): lazy per-path KnowledgeGraph cache (#1136)	2026-05-06 01:33:47 -03:00
Igor Lins e Silva	7ede231da9	Merge pull request #1167 from arnoldwender/fix/kg-date-validation fix(kg): validate ISO-8601 date formats at MCP boundary	2026-05-06 01:33:27 -03:00
Igor Lins e Silva	3824ea610c	Merge pull request #1282 from mvalentsev/fix/fact-checker-stdio-utf8 fix(cli, fact-checker): reconfigure stdio to UTF-8 on Windows	2026-05-06 01:33:15 -03:00
Igor Lins e Silva	778f830cd0	Merge pull request #1107 from sha2fiddy/fix/1073-closet-llm-paginate fix: paginate closet_llm col.get (#1073)	2026-05-06 01:33:04 -03:00
Igor Lins e Silva	e18981a527	Merge pull request #1215 from arnoldwender/fix/entity-registry-atomic-write fix(entity_registry): atomic write to prevent partial corruption on crash	2026-05-06 01:32:46 -03:00
Igor Lins e Silva	ef0e45ad92	Merge pull request #1105 from mvalentsev/fix/chroma-backend-close-releases-lock fix(backends/chroma): release SQLite file lock on close_palace/close (#1067)	2026-05-06 01:32:30 -03:00
Igor Lins e Silva	0cfb4b3ef1	Merge pull request #1214 from arnoldwender/fix/kg-temporal-inversion-guard fix(kg): reject inverted intervals in add_triple (valid_to < valid_from)	2026-05-06 01:32:16 -03:00
Arnold Wender	4f36145c2e	fix(entity_registry): atomic write to prevent partial corruption on crash EntityRegistry.save() called Path.write_text() directly, which truncates the target file and then writes — so a crash mid-write (power loss, OOM, filesystem-full mid-flush) leaves an empty or half-written entity_registry.json. The whole people/projects map is lost; the system falls back to an empty registry on next load. Switch to the standard atomic-write pattern: serialize to a sibling .tmp file in the same directory (so os.replace stays on one filesystem), fsync, chmod 0o600, then os.replace over the target. The replace is atomic on POSIX and Windows, so any crash leaves the previous registry intact instead of a truncated file. Tests cover: no leftover .tmp on success, and previous content preserved when os.replace itself raises mid-save.	2026-05-04 11:08:14 +02:00
mvalentsev	b8816e0fe2	fix(mcp): retry KG handlers once on concurrent close race Race scenario: a KG tool handler calls _get_kg() and gets a live KnowledgeGraph; another thread fires tool_reconnect() between that return and the handler's kg.add_triple()/kg.query_entity()/etc call. tool_reconnect drains _kg_by_path and closes the underlying sqlite3.Connection; the handler then raises sqlite3.ProgrammingError: 'Cannot operate on a closed database', which surfaces as a -32000 to the MCP client even though the user just asked for a reconnect. New _call_kg(op) helper wraps each handler's kg call in a one-shot retry: catch exactly sqlite3.ProgrammingError, evict the stale entry (only if the cache slot still points at the closed instance — another thread may have already replaced it), and rerun op against a fresh _get_kg(). Beyond one retry give up so a sustained close-stream surfaces clearly instead of looping. All five KG handlers (tool_kg_query, tool_kg_add, tool_kg_invalidate, tool_kg_timeline, tool_kg_stats) now route through _call_kg. Tests pin the contract: * retries with a fresh KG and returns the second result * non-ProgrammingError exceptions propagate without retry * gives up after exactly one retry on sustained close	2026-05-03 21:43:51 +05:00
mvalentsev	03643eb507	fix(cli, fact-checker): per-stream stdio errors policy on Windows Previously all three streams reconfigured to UTF-8 with errors='strict'. That kills 'mempalace search' the moment a drawer carrying a surrogate half (round-tripped from a filename via surrogateescape) hits print(), losing the rest of the result block. Same hazard for warning lines on stderr. Split the policy: stdin -> surrogateescape (malformed bytes from a redirected file survive as lone surrogates instead of crashing the read) stdout -> replace (drawer text with a stray surrogate becomes U+FFFD instead of UnicodeEncodeError mid-print) stderr -> replace (same protection for logger / warning paths) Applied identically in the cli.py and fact_checker.py helpers; the DRY extraction into a shared module is a separate cleanup ask, kept out of this fix to keep the diff narrow. Tests updated for the new per-stream assertion.	2026-05-03 21:37:12 +05:00
mvalentsev	32f4dfa26d	fix(cli): reconfigure stdio to UTF-8 on Windows The primary `mempalace` console_script (`cli.py:main()`) reads non-ASCII arguments via piped stdin and writes verbatim drawer text / wing names through `print()`. On Windows, Python defaults stdio to the system ANSI codepage (cp1252/cp1251/cp950), so: - `mempalace search "..." > out.txt` mojibakes any drawer text containing non-Latin characters - `mempalace ... < input.txt` mojibakes piped non-ASCII input Reconfigure stdin/stdout/stderr to UTF-8 (`errors="strict"`) at the top of `main()`, mirroring the helper added in this PR for fact_checker's `__main__` block. Wrapped in try/except so a replaced stream (Jupyter, test harness) logs a warning and continues rather than crashing the CLI. The reconfigure cascades through every `mempalace` subcommand (`init`/`mine`/`search`/`status`/`hook`/etc.) and through the interactive flows that read non-ASCII names via `input()` (onboarding, entity detector, room detector). With this commit the package's three user-facing entry points (`mempalace`, `mempalace-mcp`, and `python -m mempalace.fact_checker`) all reconfigure stdio identically on Windows.	2026-05-03 21:33:54 +05:00
mvalentsev	7cee74c8c8	fix(fact-checker): reconfigure stdio to UTF-8 on Windows The `python -m mempalace.fact_checker --stdin` entry point reads non-ASCII text through the system ANSI codepage (cp1252/cp1251/cp950) on Windows, which mojibakes characters before claim-extraction sees them. Reconfigure stdin/stdout/stderr to UTF-8 with `errors="strict"`, wrapped in try/except so a replaced stream (Jupyter, test harness) logs a warning rather than crashing the CLI. Mirrors the same fix shipped for `mcp_server.py:main()` (#400) and `hooks_cli.py:run_hook()` (#1280) -- this is the third and last stdin-reading entry point in the package.	2026-05-03 21:33:54 +05:00
mvalentsev	45df1a2657	fix(backends/chroma): release SQLite file lock on close_palace/close (#1067 ) ChromaBackend.close_palace() and close() evicted cached PersistentClients from self._clients without calling client.close(), so chromadb 1.5.x kept the rust-side SQLite file lock until GC. Reopening the same palace path after shutil.rmtree + re-create within one process then failed with SQLITE_READONLY_DBMOVED (SQLite code 1032). Add _close_client() helper with a try/except fallback for older chromadb, and route close_palace(), close(), and the DB-file-missing invalidation branch of _client() through it. The mtime/inode auto-invalidation branch is left as-is: callers there may still hold a live ChromaCollection handle, and closing out from under them clears the rust bindings mid-use. Regression tests cover close_palace reopen-same-path and whole-backend close for multiple palaces.	2026-05-03 19:16:25 +05:00
mvalentsev	0a62658051	fix(mcp): drain KG cache on tool_reconnect tool_reconnect cleared ChromaDB caches but left _kg_by_path entries intact. After an external replacement of knowledge_graph.sqlite3 the server kept serving the old open sqlite3.Connection, returning stale results. Now iterate _kg_by_path under _kg_cache_lock, call close() best-effort, and clear the dict so the next tool call reopens the KG from disk. Two new tests in TestKGLazyCache verify cache invalidation and that a failing close() does not block the clear.	2026-05-03 17:43:00 +05:00
mvalentsev	84f9726a39	test(mcp): fix Windows subprocess env in KG lazy-init test Passing a stripped env dict without SYSTEMROOT/WINDIR breaks Python bootstrap on Windows (_Py_HashRandomization_Init). Inherit the parent env and strip MEMPAL* vars instead, then override HOME/USERPROFILE to the tmp dir.	2026-05-03 17:43:00 +05:00
mvalentsev	c69a622a18	test(mcp): add multi-tenant and lazy-init tests for KG (#1136 ) TestKGLazyCache covers the scenarios behind the lazy per-path refactor: - test_lazy_init_no_import_side_effect: a fresh subprocess import does not create ~/.mempalace/knowledge_graph.sqlite3 (what closed PR #167 was aiming at). - test_get_kg_returns_same_instance: two _get_kg() calls under the same resolved path return the same object, cache has one entry. - test_get_kg_different_paths_different_instances: rotating env var produces distinct KGs. - test_multi_tenant_env_switch: the exact scenario from #1136 — write under path A, query under path B returns empty, switching back to A sees the fact. - test_cache_thread_safe: 16 threads racing _get_kg() end up with one shared instance and one cache entry.	2026-05-03 17:43:00 +05:00
mvalentsev	9e730098e9	test(mcp): migrate _kg monkeypatches to _get_kg (#1136 ) Direct module-attribute patching of _kg is obsolete after the lazy cache refactor. Switch test helpers to patch _get_kg instead so the fixture KG replaces the factory rather than a now-missing singleton. - tests/test_mcp_server.py: _patch_mcp_server helper - tests/benchmarks/test_mcp_bench.py: _patch_mcp_config helper - tests/benchmarks/test_memory_profile.py: inline patch in test_tool_status_repeated_calls	2026-05-03 17:43:00 +05:00
Igor Lins e Silva	1888b671e2	Merge pull request #1321 from MemPalace/fix/1313-init-palace-flag fix(cli): honor --palace flag in cmd_init (#1313)	2026-05-03 03:54:06 -03:00
Igor Lins e Silva	a91b7ee5c2	test(cli): prime monkeypatch undo so palace env doesn't leak monkeypatch.delenv(name, raising=False) on a missing key registers no undo entry, so the env var cmd_init writes leaked into test_config_from_file on Python 3.13 / Windows / macOS. Prime the slot with setenv before delenv so teardown rolls back the write.	2026-05-03 06:27:37 -03:00
Igor Lins e Silva	5380189f82	Merge pull request #1320 from MemPalace/fix/1314-kg-temporal-params fix(mcp): forward valid_to and source params in kg_add/kg_invalidate (#1314)	2026-05-03 03:51:29 -03:00
Igor Lins e Silva	2ad379b547	Merge pull request #1306 from MemPalace/feat/hybrid-candidate-union feat(searcher): candidate_strategy="union" — BM25 candidates joined with vector pool before hybrid rerank	2026-05-03 03:40:51 -03:00
Igor Lins e Silva	3eb7980e55	fix(searcher): address Copilot review on #1306 - Dedup union candidates by (full_path, chunk_index), not basename — two files sharing a basename in different dirs no longer collide, and a vector hit on chunk N of a file no longer blocks BM25 from contributing chunk M of the same file. - Validate candidate_strategy at the top of search_memories so invalid values fail consistently, not only when the call routes through the vector path. - Trim hits back to n_results after the union+rerank pool grows; preserves the existing search_memories size contract that the MCP limit parameter is built on. - Skip BM25-only injection when max_distance > 0.0; BM25-only candidates carry distance=None and would silently bypass the caller's strict vector-distance threshold. Adds 4 tests covering: validation under vector_disabled, n_results trim, max_distance honoring, and basename-collision dedup.	2026-05-03 06:09:10 -03:00
Igor Lins e Silva	3e6f6480c0	Merge pull request #1325 from MemPalace/security/mcp-omit-absolute-paths fix(mcp): omit absolute filesystem paths from MCP tool responses	2026-05-03 03:20:11 -03:00
Igor Lins e Silva	7fc260f752	fix(mcp): basename source_file in tool_get_drawer responses The MCP `mempalace_get_drawer` tool returned the entire raw drawer metadata blob to any connected client, and the `source_file` field in that blob is the absolute filesystem path written by the miners (`miner.py`, `convo_miner.py` — `source_file = str(filepath)`). On a single-user local deployment this is self-disclosure, but in nested-agent or multi-server MCP topologies the client is a separate trust domain and the host's directory layout has no documented client-side use. Mirror the mitigation that `searcher.search_memories()` already applies on its own return path: reduce `source_file` to its basename via `Path(source_file).name` before handing the metadata to the client. Citations still work — the directory layout does not leak. Companion to #1 (omit palace_path from tool_status). Same threat class, different surface: - mempalace_status — palace dir path → fixed in #1 - mempalace_get_drawer — per-drawer source_file path → this PR Other read tools were audited and do not leak host paths: - mempalace_search — already basenames source_file - mempalace_list_drawers — returns wing/room/preview only - mempalace_diary_read — date/timestamp/topic/content only - mempalace_reconnect — success/message/drawers only - mempalace_kg_* — entity/predicate strings, counts - mempalace_check_duplicate — wing/room/preview only Changes: - mempalace/mcp_server.py: tool_get_drawer() now basenames metadata.source_file - tests/test_mcp_server.py: regression test asserting the absolute path and its parent directory do not appear anywhere in the response - website/reference/mcp-tools.md: clarify the documented return shape	2026-05-03 05:58:46 -03:00
Igor Lins e Silva	6f88b2a34e	Merge pull request #1322 from MemPalace/fix/1121-1132-1263-client-quarantine fix(backends/chroma): wire quarantine_stale_hnsw into _client() (#1121 #1132 #1263)	2026-05-03 03:18:28 -03:00
Igor Lins e Silva	a690eb398f	Merge pull request #1323 from MemPalace/fix/1243-diary-case-insensitive fix(mcp): case-insensitive agent name in diary read/write (#1243)	2026-05-03 03:18:11 -03:00
igorls	2397481158	style: ruff format tests/test_mcp_server.py (PR #1323 )	2026-05-02 23:00:10 -03:00
igorls	f854d86d2f	style: ruff format tests/test_backends.py (PR #1322 )	2026-05-02 23:00:08 -03:00
igorls	2857948c1e	style: ruff format tests/test_cli.py (PR #1319 )	2026-05-02 23:00:07 -03:00
igorls	6ffbf6ffc3	style: ruff format test_mcp_server.py (PR #1320 )	2026-05-02 22:59:50 -03:00
igorls	b4a9f2adf2	style: ruff format touched files (PR #1322 ) CI requires whole-file format on touched files; pre-existing drift only.	2026-05-02 22:58:57 -03:00
Igor Lins e Silva	e9222b4c7b	fix(mcp): case-insensitive agent name in diary_write/diary_read (#1243 ) `tool_diary_write` stored the `agent` metadata verbatim after `sanitize_name` (which preserves case), while `tool_diary_read` filtered by exact match — so writing as "Claude" and reading as "claude" silently returned zero rows. Both endpoints now lowercase `agent_name` immediately after sanitization. The default per-agent wing slug is also stable across casings since it's derived from the same normalized form. Behavior change: entries written prior to this fix under mixed-case agent names will not match the new lowercase filter; documented under v3.3.5 in CHANGELOG with a `mempalace repair` pointer. Adds a regression test (`test_diary_read_case_insensitive_agent`) and updates the existing `test_diary_write_and_read` to assert the new lowercase agent identity. Closes #1243 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:57:09 -03:00
igorls	10733f1df4	fix(backends/chroma): wire quarantine_stale_hnsw into _client() to prevent SIGSEGV on stale HNSW (#1121 , #1132 , #1263 ) PR #1173 wired quarantine_stale_hnsw into the static make_client() helper but not into the instance _client() method. As a result every non-MCP entry point (CLI mining, search, repair, status) — which all use get_collection / _get_or_create_collection / _client() — skipped the cold-start quarantine pass and could SIGSEGV on a stale HNSW segment left over from a partial flush, replicated palace, or crashed-mid-write. Refactor: extract the (_fix_blob_seq_ids + gated quarantine_stale_hnsw) pre-open pass into a single private static helper ChromaBackend._prepare_palace_for_open(). Both make_client() and _client() now route through it, so the _quarantined_paths once-per- palace-per-process gate is preserved (no runtime thrash on hot paths) and behaviour stays identical — the fix is purely about extending the existing protection to the path that was missing it. Tests: - test_client_quarantines_corrupt_segment_on_first_open mirrors the existing make_client test and verifies _client() actually renames a corrupt segment on first open. - test_client_quarantines_only_on_first_call_per_palace verifies the cache gate prevents re-running quarantine across repeated _client() calls — important because _client() is hit on every backend op. Closes #1121. Closes #1132. Closes #1263. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:56:36 -03:00
Igor Lins e Silva	01b3183e5d	fix(cli): honor --palace flag in cmd_init (#1313 ) cmd_init was instantiating MempalaceConfig() unconditionally, ignoring args.palace and always writing the palace under ~/.mempalace. Mirror the env-var pattern used by mcp_server.py (and consistent with how cmd_mine / cmd_status / cmd_search resolve --palace) so every downstream read of cfg.palace_path inside cmd_init — Pass 0, cfg.init(), and the post-init mine — routes to the user-specified location. Adds tests/test_cli.py::test_cmd_init_honors_palace_flag covering the regression: asserts Pass 0 receives the --palace value (not ~/.mempalace) and that MEMPALACE_PALACE_PATH is set in os.environ. Closes #1313.	2026-05-02 22:56:31 -03:00
Igor Lins e Silva	e4e25ed186	fix(mcp): forward valid_to and source params in kg_add/kg_invalidate (#1314 ) `tool_kg_add` previously accepted only `valid_from` and `source_closet`, silently dropping `valid_to`, `source_file`, and `source_drawer_id` at the MCP boundary. Backfilling already-ended historical facts therefore collapsed to "still current," and adapter provenance never reached the SQLite layer even though `KnowledgeGraph.add_triple` already supported every column. `tool_kg_invalidate` returned the literal string `"today"` whenever the caller omitted `ended`, hiding the actual stamped date from anyone trying to verify what got persisted. Changes: - Extend `tool_kg_add` signature + MCP input_schema with `valid_to`, `source_file`, `source_drawer_id`; forward all of them to `_kg.add_triple` and to the WAL log. - Resolve `ended` to `date.today().isoformat()` in `tool_kg_invalidate` before logging / returning, so the response always reports the actual date stored in `valid_to`. - Add regression tests for valid_to round-trip, source_file / source_drawer_id provenance, and the resolved-ended-date contract. - Leave TODO(#1283) markers so the open ISO-8601 validation PR can drop `validate_iso_date` over `valid_from` / `valid_to` / `ended` cleanly. The underlying `KnowledgeGraph.add_triple` already accepted these kwargs (RFC 002 §5.5) — only the MCP edge needed wiring up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:54:32 -03:00
Igor Lins e Silva	cbd6e5d65d	fix(cli): write compress output to mempalace_closets so palace can read them (#1244 ) `cmd_compress` was writing AAAK-compressed drawers to a `mempalace_compressed` collection, but every read path (`palace.get_closets_collection`, `searcher.py`, `repair.py`) reads from `mempalace_closets`. Result: for non-mined palaces (or any palace where the user ran `mempalace compress` expecting to backfill the closet/index layer), the compressed output was silently invisible — written to a collection nothing else opens. Fix the writer rather than renaming the readers: "closets" is the user-visible feature name baked into the public API (`get_closets_collection`), the searcher hybrid path, repair/HNSW diagnostics, and docs. Renaming the readers would churn 15+ call sites and the README for no benefit. The compressed AAAK strings are exactly what closets are conceptually — compact pointers scanned by an LLM to locate the right drawer — so they belong in `mempalace_closets`. Tests: - Update `test_cmd_compress_stores_results` to assert the collection name passed to `get_or_create_collection` is `mempalace_closets`. - Add `test_cmd_compress_output_readable_via_get_closets_collection`: end-to-end with a real ChromaBackend, seed a drawer, run cmd_compress, then read back via the same `get_closets_collection` helper that palace.py / searcher use. Regression test for the wrong-collection bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:54:14 -03:00
lcatlett	2d50b214d4	fix(hooks): use is_dir() for palace root check (review feedback) Both @igorls and the Qodo bot flagged that `_palace_root_exists()` used `Path.exists()`, which returns True for a regular file. A stray file at `~/.mempalace` would let the kill-switch be bypassed and crash later in `STATE_DIR.mkdir()` with NotADirectoryError. Switched to `Path.is_dir()`. Also fold `_log()`'s inline check through `_palace_root_exists()` so both kill-switch sites use the same predicate. New test pins the behavior: a regular file at the palace root path is treated as absent (hook short-circuits, _log does not crash, the stray file is left untouched).	2026-05-02 20:37:47 -04:00
lcatlett	8472d553a3	fix(hooks): treat absent ~/.mempalace as auto-save off When the user removes ~/.mempalace/ (a strong "do not auto-capture" signal), the next hook fire would silently recreate the entire dir hierarchy and ingest existing transcripts: 1. _log() at hooks_cli.py:148 unconditionally calls STATE_DIR.mkdir(parents=True, exist_ok=True), so the act of writing the hook log line recreated ~/.mempalace/hook_state/ 2. With no config file present, hook_stop_auto_save and hook_precompact_auto_save defaulted to True (no override to read) 3. The full save path then ran, materializing palace/, wal/, knowledge_graph.sqlite3, and N drawers from existing transcripts in ~/.claude/projects/*.jsonl All four entry points (hook_stop, hook_precompact, hook_session_start, and _log itself) now check a new PALACE_ROOT = Path.home() / ".mempalace" constant first and short-circuit (returning {} on stdout, never logging) when the dir is absent. The user-removable directory is now a kill-switch. Five unit tests in tests/test_hooks_cli.py cover: hook_stop / hook_precompact / hook_session_start do not create the dir when absent; _log() does not create it when absent; existing dir proceeds normally (regression). Caught in the wild on a downstream fork: ~146 drawers materialized in under a second after a deliberate `rm -rf ~/.mempalace/`, into a planning session that was explicitly not meant to be captured.	2026-05-02 20:33:58 -04:00
Igor Lins e Silva	6509071b8e	feat(searcher): add candidate_strategy="union" for vector∪BM25 reranking pool Default search behavior is unchanged. Opt-in candidate_strategy="union" also pulls top-K BM25-only candidates from sqlite FTS5 and merges them into the rerank pool, catching docs with strong BM25 signal that the vector index didn't surface in the over-fetch window. Motivation ---------- The current hybrid path gathers candidates from the vector index only (n_results * 3 over-fetch), then BM25-reranks within them. When the query embeds close to the wrong content semantically, the right doc never enters the rerank pool — no matter how wide the over-fetch. Tested on a ~6K-document mixed corpus (knowledge prose + short structured records): at 30x over-fetch (~5% of the corpus) the target doc still didn't surface for narrative-shaped queries targeting terminology guides. Wider over-fetch isn't the answer; widening the pool's source is. Concrete failure mode: a narrative-shaped query embeds close to records sharing the same operational vocabulary (other narrative entries in the corpus). A terminology / style guide is BM25-strong for the query (rare keywords the guide repeats) but vector-distant. Vector-only candidates don't include it; BM25 never gets to rerank it. The hybrid path produces 0.00 recall on a probe that pure BM25 alone scores 1.00 — the hybrid is worse than its component on the same input. Behavior change --------------- * New parameter ``candidate_strategy: str = "vector"`` on ``search_memories``. - ``"vector"`` (default): historical behavior, no change. - ``"union"``: also fetch top ``n_results * 3`` candidates via the existing ``_bm25_only_via_sqlite`` helper, dedupe by source_file, merge into the rerank pool. BM25-only candidates carry ``distance=None`` so they're scored on BM25 contribution alone (vec_sim coerces to 0). * ``_hybrid_rank`` now handles ``distance=None`` explicitly, scoring such candidates as vector-unknown (vec_sim=0) rather than treating it as max-distance via shim. * New strategies register via ``_CANDIDATE_MERGERS``; dispatch is in ``_apply_candidate_strategy`` so ``search_memories`` stays under the C901 complexity ceiling. Bench numbers (~6K-doc internal mixed corpus, recall@10, 5 probes spanning policy-exception lookup, temporal-decay, style retrieval, set-difference, and pattern-recognition): baseline ("vector") "union" policy-exception probe 0.00 0.50 +0.50 temporal-decay probe 0.17 0.50 +0.33 style-retrieval probe 0.00 1.00 +1.00 (PASSES) set-difference probe 0.00–0.06 0.06–0.09 ~ pattern-recog probe 0.64 (stable) 0.50–0.71 variance, typ. +0.07 macro recall 0.16–0.17 0.51–0.56 +0.34 to +0.40 The pattern-recog variance points at a related issue worth a separate PR: ``_hybrid_rank`` computes BM25 IDF over the candidate set. Adding new candidates re-normalizes BM25 for existing candidates non-monotonically. Stable corpus-wide BM25 would remove this. Out of scope here. Tests ----- ``tests/test_hybrid_candidate_union.py`` (6 tests, all pass): - default behavior unchanged (explicit ``"vector"`` matches default) - ``"union"`` surfaces a BM25-strong vector-distant doc - ``"union"`` doesn't drop docs ``"vector"`` would have found - empty-palace handling - invalid ``candidate_strategy`` raises - ``_hybrid_rank`` tolerates ``distance=None`` Existing ``test_hybrid_search.py`` (5) and ``test_searcher.py`` (27) pass. Performance note ---------------- Each ``"union"`` query adds one sqlite open + FTS5 MATCH + metadata fetch (via the existing ``_bm25_only_via_sqlite`` helper, which already runs as the ``vector_disabled`` fallback path so the code is well-trodden). Per-query overhead is small but unmeasured at corpus scale. Default stays ``"vector"`` until a maintainer characterizes the cost.	2026-05-02 00:50:19 -03:00
Igor Lins e Silva	ac6c2b6af6	fix(mcp_server): pass embedding_function= on collection reopen (#1299 ) `mcp_server._get_collection` bypassed `ChromaBackend.get_collection` and called `client.get_collection` / `client.create_collection` without `embedding_function=`. ChromaDB 1.x does not persist the EF identity with the collection, so the MCP server's reopen silently bound chromadb's built-in `DefaultEmbeddingFunction` while the miner / Stop hook ingest path bound `mempalace.embedding.get_embedding_function()`. On bleeding-edge interpreters (python 3.14 + chromadb 1.5.x on Apple Silicon, per #1299) the default EF's lazy ONNX provider selection could SIGSEGV the host process on first `col.add()`, killing the MCP stdio server and leaving every subsequent tool call returning `Connection closed` until Claude Code was relaunched. Reads worked because `col.get(ids=...)` and metadata fetches don't invoke the EF; the auto-ingest path worked because mining routes through the backend abstraction. Diary writes were the consistent failure surface. Resolve the EF up front (matching `ChromaBackend._resolve_embedding_function`) and pass it into both reopen branches. Falls back to the chromadb default only if `mempalace.embedding.get_embedding_function` itself raises. Regression test patches the chromadb client class to capture `embedding_function=` on every `get_collection` / `create_collection` call from `_get_collection(create=True)` and `_get_collection()`, and fails if any call omits it. Follow-up to #1262 / #1289 (which fixed the metadata-mismatch SIGSEGV path); this addresses the EF-mismatch SIGSEGV path on the same surface.	2026-05-01 19:34:38 -03:00
Igor Lins e Silva	9dd56ecb0a	fix(mcp_server): split get_or_create_collection on reopen (follow-up to #1262 ) #1262 split `get_or_create_collection` into `get_collection` + fallback `create_collection` inside `ChromaBackend.get_collection`, fixing the chromadb 1.5.x Rust-binding SIGSEGV that fires when stored collection metadata differs from the call-site's `_HNSW_BLOAT_GUARD` payload. The MCP server's `_get_collection(create=True)` carries the same metadata payload at `mcp_server.py:287` and routes through chromadb's Python client directly, bypassing the backend layer. Both `tool_add_drawer` and `tool_diary_write` reach this site on every invocation, and the Stop hook fires `mempalace_diary_write` at session end — which was exactly the crash path #1089 named. Apply the same try/except split here so legacy palaces whose stored metadata predates the bloat-guard expansion no longer crash on the MCP-server reopen path. Regression test patches `get_or_create_collection` at the chromadb client class level (not the instance — chromadb's mtime-change detection rebuilds the client between calls, so an instance-level spy doesn't survive) and asserts the second `_get_collection(create=True)` call never reaches it.	2026-04-30 22:35:18 -03:00
Igor Lins e Silva	73541d1606	Merge pull request #1262 from Legion345/fix/stop-hook-crash fix(storage): stop ChromaDB from crashing when reopening an existing …	2026-04-30 22:30:08 -03:00
Igor Lins e Silva	96bb80a356	Merge pull request #1287 from messelink/fix/hnsw-divergence-scales-with-sync-threshold fix(repair): scale HNSW divergence floor with hnsw:sync_threshold	2026-04-30 22:28:07 -03:00
Igor Lins e Silva	3b5ebcc9fc	fix(repair): decode BLOB embeddings.seq_id in max-seq-id heuristic (#1254 ) `_compute_heuristic_seq_id` ran `int(row[0])` directly on the result of `MAX(e.seq_id)`. On palaces where chromadb 1.5.x has been writing seq_ids natively (8-byte big-endian uint64 BLOB), that raises `ValueError: invalid literal for int() with base 10: b'...'` before the dry-run can print, leaving users with no path through the recovery feature added in #1135 — the only documented un-poison route for palaces hit by the original PR #664 shim bug. Decode BLOB return values via `int.from_bytes(val, "big")` and keep the existing `int(val)` path for INTEGER rows. Regression test seeds a BLOB row in `embeddings.seq_id` and asserts the heuristic surfaces the correct integer.	2026-04-30 22:04:41 -03:00

1 2 3 4 5 ...

321 Commits