mempalace

Author	SHA1	Message	Date
igorls	2397481158	style: ruff format tests/test_mcp_server.py (PR #1323 )	2026-05-02 23:00:10 -03:00
igorls	f854d86d2f	style: ruff format tests/test_backends.py (PR #1322 )	2026-05-02 23:00:08 -03:00
igorls	2857948c1e	style: ruff format tests/test_cli.py (PR #1319 )	2026-05-02 23:00:07 -03:00
igorls	6ffbf6ffc3	style: ruff format test_mcp_server.py (PR #1320 )	2026-05-02 22:59:50 -03:00
igorls	b4a9f2adf2	style: ruff format touched files (PR #1322 ) CI requires whole-file format on touched files; pre-existing drift only.	2026-05-02 22:58:57 -03:00
igorls	4b0fc44451	style: ruff format cli.py (#1244 ) CI requires ruff format --check on the whole touched file. Pre-existing drift, no logic change.	2026-05-02 22:58:45 -03:00
Igor Lins e Silva	e9222b4c7b	fix(mcp): case-insensitive agent name in diary_write/diary_read (#1243 ) `tool_diary_write` stored the `agent` metadata verbatim after `sanitize_name` (which preserves case), while `tool_diary_read` filtered by exact match — so writing as "Claude" and reading as "claude" silently returned zero rows. Both endpoints now lowercase `agent_name` immediately after sanitization. The default per-agent wing slug is also stable across casings since it's derived from the same normalized form. Behavior change: entries written prior to this fix under mixed-case agent names will not match the new lowercase filter; documented under v3.3.5 in CHANGELOG with a `mempalace repair` pointer. Adds a regression test (`test_diary_read_case_insensitive_agent`) and updates the existing `test_diary_write_and_read` to assert the new lowercase agent identity. Closes #1243 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:57:09 -03:00
igorls	10733f1df4	fix(backends/chroma): wire quarantine_stale_hnsw into _client() to prevent SIGSEGV on stale HNSW (#1121 , #1132 , #1263 ) PR #1173 wired quarantine_stale_hnsw into the static make_client() helper but not into the instance _client() method. As a result every non-MCP entry point (CLI mining, search, repair, status) — which all use get_collection / _get_or_create_collection / _client() — skipped the cold-start quarantine pass and could SIGSEGV on a stale HNSW segment left over from a partial flush, replicated palace, or crashed-mid-write. Refactor: extract the (_fix_blob_seq_ids + gated quarantine_stale_hnsw) pre-open pass into a single private static helper ChromaBackend._prepare_palace_for_open(). Both make_client() and _client() now route through it, so the _quarantined_paths once-per- palace-per-process gate is preserved (no runtime thrash on hot paths) and behaviour stays identical — the fix is purely about extending the existing protection to the path that was missing it. Tests: - test_client_quarantines_corrupt_segment_on_first_open mirrors the existing make_client test and verifies _client() actually renames a corrupt segment on first open. - test_client_quarantines_only_on_first_call_per_palace verifies the cache gate prevents re-running quarantine across repeated _client() calls — important because _client() is hit on every backend op. Closes #1121. Closes #1132. Closes #1263. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:56:36 -03:00
Igor Lins e Silva	01b3183e5d	fix(cli): honor --palace flag in cmd_init (#1313 ) cmd_init was instantiating MempalaceConfig() unconditionally, ignoring args.palace and always writing the palace under ~/.mempalace. Mirror the env-var pattern used by mcp_server.py (and consistent with how cmd_mine / cmd_status / cmd_search resolve --palace) so every downstream read of cfg.palace_path inside cmd_init — Pass 0, cfg.init(), and the post-init mine — routes to the user-specified location. Adds tests/test_cli.py::test_cmd_init_honors_palace_flag covering the regression: asserts Pass 0 receives the --palace value (not ~/.mempalace) and that MEMPALACE_PALACE_PATH is set in os.environ. Closes #1313.	2026-05-02 22:56:31 -03:00
Igor Lins e Silva	e4e25ed186	fix(mcp): forward valid_to and source params in kg_add/kg_invalidate (#1314 ) `tool_kg_add` previously accepted only `valid_from` and `source_closet`, silently dropping `valid_to`, `source_file`, and `source_drawer_id` at the MCP boundary. Backfilling already-ended historical facts therefore collapsed to "still current," and adapter provenance never reached the SQLite layer even though `KnowledgeGraph.add_triple` already supported every column. `tool_kg_invalidate` returned the literal string `"today"` whenever the caller omitted `ended`, hiding the actual stamped date from anyone trying to verify what got persisted. Changes: - Extend `tool_kg_add` signature + MCP input_schema with `valid_to`, `source_file`, `source_drawer_id`; forward all of them to `_kg.add_triple` and to the WAL log. - Resolve `ended` to `date.today().isoformat()` in `tool_kg_invalidate` before logging / returning, so the response always reports the actual date stored in `valid_to`. - Add regression tests for valid_to round-trip, source_file / source_drawer_id provenance, and the resolved-ended-date contract. - Leave TODO(#1283) markers so the open ISO-8601 validation PR can drop `validate_iso_date` over `valid_from` / `valid_to` / `ended` cleanly. The underlying `KnowledgeGraph.add_triple` already accepted these kwargs (RFC 002 §5.5) — only the MCP edge needed wiring up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:54:32 -03:00
Igor Lins e Silva	cbd6e5d65d	fix(cli): write compress output to mempalace_closets so palace can read them (#1244 ) `cmd_compress` was writing AAAK-compressed drawers to a `mempalace_compressed` collection, but every read path (`palace.get_closets_collection`, `searcher.py`, `repair.py`) reads from `mempalace_closets`. Result: for non-mined palaces (or any palace where the user ran `mempalace compress` expecting to backfill the closet/index layer), the compressed output was silently invisible — written to a collection nothing else opens. Fix the writer rather than renaming the readers: "closets" is the user-visible feature name baked into the public API (`get_closets_collection`), the searcher hybrid path, repair/HNSW diagnostics, and docs. Renaming the readers would churn 15+ call sites and the README for no benefit. The compressed AAAK strings are exactly what closets are conceptually — compact pointers scanned by an LLM to locate the right drawer — so they belong in `mempalace_closets`. Tests: - Update `test_cmd_compress_stores_results` to assert the collection name passed to `get_or_create_collection` is `mempalace_closets`. - Add `test_cmd_compress_output_readable_via_get_closets_collection`: end-to-end with a real ChromaBackend, seed a drawer, run cmd_compress, then read back via the same `get_closets_collection` helper that palace.py / searcher use. Regression test for the wrong-collection bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:54:14 -03:00
lcatlett	2d50b214d4	fix(hooks): use is_dir() for palace root check (review feedback) Both @igorls and the Qodo bot flagged that `_palace_root_exists()` used `Path.exists()`, which returns True for a regular file. A stray file at `~/.mempalace` would let the kill-switch be bypassed and crash later in `STATE_DIR.mkdir()` with NotADirectoryError. Switched to `Path.is_dir()`. Also fold `_log()`'s inline check through `_palace_root_exists()` so both kill-switch sites use the same predicate. New test pins the behavior: a regular file at the palace root path is treated as absent (hook short-circuits, _log does not crash, the stray file is left untouched).	2026-05-02 20:37:47 -04:00
lcatlett	8472d553a3	fix(hooks): treat absent ~/.mempalace as auto-save off When the user removes ~/.mempalace/ (a strong "do not auto-capture" signal), the next hook fire would silently recreate the entire dir hierarchy and ingest existing transcripts: 1. _log() at hooks_cli.py:148 unconditionally calls STATE_DIR.mkdir(parents=True, exist_ok=True), so the act of writing the hook log line recreated ~/.mempalace/hook_state/ 2. With no config file present, hook_stop_auto_save and hook_precompact_auto_save defaulted to True (no override to read) 3. The full save path then ran, materializing palace/, wal/, knowledge_graph.sqlite3, and N drawers from existing transcripts in ~/.claude/projects/*.jsonl All four entry points (hook_stop, hook_precompact, hook_session_start, and _log itself) now check a new PALACE_ROOT = Path.home() / ".mempalace" constant first and short-circuit (returning {} on stdout, never logging) when the dir is absent. The user-removable directory is now a kill-switch. Five unit tests in tests/test_hooks_cli.py cover: hook_stop / hook_precompact / hook_session_start do not create the dir when absent; _log() does not create it when absent; existing dir proceeds normally (regression). Caught in the wild on a downstream fork: ~146 drawers materialized in under a second after a deliberate `rm -rf ~/.mempalace/`, into a planning session that was explicitly not meant to be captured.	2026-05-02 20:33:58 -04:00
Mikhail Valentsev	d07b730f08	fix(hooks): quote CLAUDE_PLUGIN_ROOT / CODEX_PLUGIN_ROOT in hooks.json (#1076 ) (#1077 ) Shell splits hook command on whitespace after variable expansion, breaking paths with spaces (e.g. C:\Users\Richard M on Windows). Wrapping the path in double quotes preserves the token boundary. Fixes the reported Stop/PreCompact pair in .claude-plugin/hooks/hooks.json and applies the same fix to .codex-plugin/hooks.json (SessionStart/Stop/ PreCompact), which carries the identical bug.	2026-05-02 21:25:11 -03:00
Igor Lins e Silva	6509071b8e	feat(searcher): add candidate_strategy="union" for vector∪BM25 reranking pool Default search behavior is unchanged. Opt-in candidate_strategy="union" also pulls top-K BM25-only candidates from sqlite FTS5 and merges them into the rerank pool, catching docs with strong BM25 signal that the vector index didn't surface in the over-fetch window. Motivation ---------- The current hybrid path gathers candidates from the vector index only (n_results * 3 over-fetch), then BM25-reranks within them. When the query embeds close to the wrong content semantically, the right doc never enters the rerank pool — no matter how wide the over-fetch. Tested on a ~6K-document mixed corpus (knowledge prose + short structured records): at 30x over-fetch (~5% of the corpus) the target doc still didn't surface for narrative-shaped queries targeting terminology guides. Wider over-fetch isn't the answer; widening the pool's source is. Concrete failure mode: a narrative-shaped query embeds close to records sharing the same operational vocabulary (other narrative entries in the corpus). A terminology / style guide is BM25-strong for the query (rare keywords the guide repeats) but vector-distant. Vector-only candidates don't include it; BM25 never gets to rerank it. The hybrid path produces 0.00 recall on a probe that pure BM25 alone scores 1.00 — the hybrid is worse than its component on the same input. Behavior change --------------- * New parameter ``candidate_strategy: str = "vector"`` on ``search_memories``. - ``"vector"`` (default): historical behavior, no change. - ``"union"``: also fetch top ``n_results * 3`` candidates via the existing ``_bm25_only_via_sqlite`` helper, dedupe by source_file, merge into the rerank pool. BM25-only candidates carry ``distance=None`` so they're scored on BM25 contribution alone (vec_sim coerces to 0). * ``_hybrid_rank`` now handles ``distance=None`` explicitly, scoring such candidates as vector-unknown (vec_sim=0) rather than treating it as max-distance via shim. * New strategies register via ``_CANDIDATE_MERGERS``; dispatch is in ``_apply_candidate_strategy`` so ``search_memories`` stays under the C901 complexity ceiling. Bench numbers (~6K-doc internal mixed corpus, recall@10, 5 probes spanning policy-exception lookup, temporal-decay, style retrieval, set-difference, and pattern-recognition): baseline ("vector") "union" policy-exception probe 0.00 0.50 +0.50 temporal-decay probe 0.17 0.50 +0.33 style-retrieval probe 0.00 1.00 +1.00 (PASSES) set-difference probe 0.00–0.06 0.06–0.09 ~ pattern-recog probe 0.64 (stable) 0.50–0.71 variance, typ. +0.07 macro recall 0.16–0.17 0.51–0.56 +0.34 to +0.40 The pattern-recog variance points at a related issue worth a separate PR: ``_hybrid_rank`` computes BM25 IDF over the candidate set. Adding new candidates re-normalizes BM25 for existing candidates non-monotonically. Stable corpus-wide BM25 would remove this. Out of scope here. Tests ----- ``tests/test_hybrid_candidate_union.py`` (6 tests, all pass): - default behavior unchanged (explicit ``"vector"`` matches default) - ``"union"`` surfaces a BM25-strong vector-distant doc - ``"union"`` doesn't drop docs ``"vector"`` would have found - empty-palace handling - invalid ``candidate_strategy`` raises - ``_hybrid_rank`` tolerates ``distance=None`` Existing ``test_hybrid_search.py`` (5) and ``test_searcher.py`` (27) pass. Performance note ---------------- Each ``"union"`` query adds one sqlite open + FTS5 MATCH + metadata fetch (via the existing ``_bm25_only_via_sqlite`` helper, which already runs as the ``vector_disabled`` fallback path so the code is well-trodden). Per-query overhead is small but unmeasured at corpus scale. Default stays ``"vector"`` until a maintainer characterizes the cost.	2026-05-02 00:50:19 -03:00
Igor Lins e Silva	5ddaf7abf6	Merge pull request #1303 from MemPalace/fix/mcp-server-missing-embedding-function fix(mcp_server): pass embedding_function= on collection reopen (#1299)	2026-05-01 20:28:05 -03:00
Igor Lins e Silva	cd98d6674e	fix(mcp_server): address copilot review on #1303 - Resolve the EF inside the two reopen branches that actually call `client.get_collection` / `client.create_collection`, so warm-cache reads stay zero-cost (no `MempalaceConfig()` / `_resolve_providers` on every tool call). - Reuse `ChromaBackend._resolve_embedding_function()` instead of duplicating its try/except + log message + None-fallback. - Reword the inline + CHANGELOG explanation to clarify that ChromaDB 1.x persists the EF identity (its `name()`) but not the instance/ configuration — `mempalace.embedding` documents this and spoofs `name()` to `"default"` precisely so the identity check passes; the bug was the provider list (lazy ONNX selection) silently differing.	2026-05-01 19:46:59 -03:00
Igor Lins e Silva	ac6c2b6af6	fix(mcp_server): pass embedding_function= on collection reopen (#1299 ) `mcp_server._get_collection` bypassed `ChromaBackend.get_collection` and called `client.get_collection` / `client.create_collection` without `embedding_function=`. ChromaDB 1.x does not persist the EF identity with the collection, so the MCP server's reopen silently bound chromadb's built-in `DefaultEmbeddingFunction` while the miner / Stop hook ingest path bound `mempalace.embedding.get_embedding_function()`. On bleeding-edge interpreters (python 3.14 + chromadb 1.5.x on Apple Silicon, per #1299) the default EF's lazy ONNX provider selection could SIGSEGV the host process on first `col.add()`, killing the MCP stdio server and leaving every subsequent tool call returning `Connection closed` until Claude Code was relaunched. Reads worked because `col.get(ids=...)` and metadata fetches don't invoke the EF; the auto-ingest path worked because mining routes through the backend abstraction. Diary writes were the consistent failure surface. Resolve the EF up front (matching `ChromaBackend._resolve_embedding_function`) and pass it into both reopen branches. Falls back to the chromadb default only if `mempalace.embedding.get_embedding_function` itself raises. Regression test patches the chromadb client class to capture `embedding_function=` on every `get_collection` / `create_collection` call from `_get_collection(create=True)` and `_get_collection()`, and fails if any call omits it. Follow-up to #1262 / #1289 (which fixed the metadata-mismatch SIGSEGV path); this addresses the EF-mismatch SIGSEGV path on the same surface.	2026-05-01 19:34:38 -03:00
Mika Cohen	0e32b9643c	fix: avoid false hnsw divergence fallback	2026-05-01 12:42:40 -06:00
Mika Cohen	f57f30025f	fix(repair): close active backend before rollback restore Rollback cleanup was instantiating a fresh ChromaBackend, so the live backend that had opened the PersistentClient could keep file handles alive during restore. Close the active backend instance instead so rollback and CLI recovery can release Windows-safe locks before copying the backup back into place.	2026-05-01 12:42:19 -06:00
Mika Cohen	2f509b4789	fix(cli): restore backup on repair failure	2026-05-01 12:42:18 -06:00
Mika Cohen	7fa27bd231	fix(repair): rebuild collections through temp staging	2026-05-01 12:42:18 -06:00
Mika Cohen	c3e1104e75	fix(chroma): harden HNSW startup preflight	2026-05-01 12:42:18 -06:00
Igor Lins e Silva	5e540da06b	Merge pull request #1289 from MemPalace/fix/mcp-server-collection-reopen-crash fix(mcp_server): split get_or_create_collection on reopen (follow-up to #1262)	2026-04-30 22:41:24 -03:00
Igor Lins e Silva	9dd56ecb0a	fix(mcp_server): split get_or_create_collection on reopen (follow-up to #1262 ) #1262 split `get_or_create_collection` into `get_collection` + fallback `create_collection` inside `ChromaBackend.get_collection`, fixing the chromadb 1.5.x Rust-binding SIGSEGV that fires when stored collection metadata differs from the call-site's `_HNSW_BLOAT_GUARD` payload. The MCP server's `_get_collection(create=True)` carries the same metadata payload at `mcp_server.py:287` and routes through chromadb's Python client directly, bypassing the backend layer. Both `tool_add_drawer` and `tool_diary_write` reach this site on every invocation, and the Stop hook fires `mempalace_diary_write` at session end — which was exactly the crash path #1089 named. Apply the same try/except split here so legacy palaces whose stored metadata predates the bloat-guard expansion no longer crash on the MCP-server reopen path. Regression test patches `get_or_create_collection` at the chromadb client class level (not the instance — chromadb's mtime-change detection rebuilds the client between calls, so an instance-level spy doesn't survive) and asserts the second `_get_collection(create=True)` call never reaches it.	2026-04-30 22:35:18 -03:00
Igor Lins e Silva	73541d1606	Merge pull request #1262 from Legion345/fix/stop-hook-crash fix(storage): stop ChromaDB from crashing when reopening an existing …	2026-04-30 22:30:08 -03:00
Igor Lins e Silva	96bb80a356	Merge pull request #1287 from messelink/fix/hnsw-divergence-scales-with-sync-threshold fix(repair): scale HNSW divergence floor with hnsw:sync_threshold	2026-04-30 22:28:07 -03:00
Igor Lins e Silva	7bc6090026	Merge pull request #1288 from MemPalace/fix/repair-max-seq-id-blob-heuristic fix(repair): decode BLOB embeddings.seq_id in max-seq-id heuristic (#1254)	2026-04-30 22:23:21 -03:00
Igor Lins e Silva	3b5ebcc9fc	fix(repair): decode BLOB embeddings.seq_id in max-seq-id heuristic (#1254 ) `_compute_heuristic_seq_id` ran `int(row[0])` directly on the result of `MAX(e.seq_id)`. On palaces where chromadb 1.5.x has been writing seq_ids natively (8-byte big-endian uint64 BLOB), that raises `ValueError: invalid literal for int() with base 10: b'...'` before the dry-run can print, leaving users with no path through the recovery feature added in #1135 — the only documented un-poison route for palaces hit by the original PR #664 shim bug. Decode BLOB return values via `int.from_bytes(val, "big")` and keep the existing `int(val)` path for INTEGER rows. Regression test seeds a BLOB row in `embeddings.seq_id` and asserts the heuristic surfaces the correct integer.	2026-04-30 22:04:41 -03:00
Pim Messelink	4a0f330cc1	fix(repair): scale HNSW divergence floor with hnsw:sync_threshold The capacity probe added in #1227 hardcoded a 2,000-row floor for the "diverged" decision. The comment justifying that number explicitly tied it to chromadb's default sync_threshold of 1,000 — "Two synchronization windows worth (2 × sync_threshold = 2000) is a safe steady-state ceiling". #1191 then bumped sync_threshold to 50,000 via _HNSW_BLOAT_GUARD without updating the floor. Result: any palace created with the bloat guard flips between OK and DIVERGED on every flush cycle. Steady-state divergence sits at 0–50K (the natural queue depth), and the 2,000 floor trips the guardrail the moment the queue exceeds 10% of sqlite_count. The MCP server then routes search to BM25-only and disables duplicate detection for ~80% of the write cycle on actively-mined ≥100K palaces, even though chromadb is behaving correctly. This change reads the configured `hnsw:sync_threshold` from `collection_metadata` per palace and scales the floor to 2 × that value. The 10% relative term and the original #1222 detection capability are unchanged — a 91%-missing-of-192K palace (the actual #1222 reproducer) still trips, regardless of whether the collection was created with sync_threshold=1000 or 50000. Behavior summary: \| Collection's sync_threshold \| New floor \| Old floor \| \|---\|---\|---\| \| Missing (legacy palace) \| 2000 \| 2000 (unchanged) \| 1000 (chromadb default) \| 2000 \| 2000 (unchanged) \| 50000 (#1191 bloat guard) \| 100000 \| 2000 (the bug) Tests: - test_capacity_status_tolerates_lag_under_large_sync_threshold (regression for the #1191/#1227 conflict — 100K sqlite + 50K HNSW + sync=50K → OK) - test_capacity_status_still_flags_real_corruption_under_large_sync (#1222 shape with bloat-guard collection — still detects corruption) - test_capacity_status_default_threshold_when_no_sync_metadata (legacy palaces without the metadata row use the 2000 fallback floor) - test_unflushed_path_also_uses_dynamic_floor (the never-flushed branch scales too — 30K under sync_threshold=50000 is no longer flagged) All 18 pre-existing tests in tests/test_hnsw_capacity.py and 45 tests in tests/test_backends.py still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 00:31:47 +00:00
Arnold Wender	abe85763d4	fix(kg): reject partial ISO dates to avoid silent empty result sets Per qodo-ai review on PR #1167: sanitize_iso_date() previously accepted YYYY and YYYY-MM, but KnowledgeGraph.query_entity() compares valid_from/ valid_to TEXT columns lexicographically against as_of. Lexicographic comparison treats '2026-01-01' as greater than '2026' (because '-' > end-of-string), so partial as_of values silently excluded valid facts — re-introducing the silent-empty-results problem this PR was meant to fix. Tighten _ISO_DATE_RE to require YYYY-MM-DD only. Update docstring and error message accordingly. Invert the two test cases that asserted partials were accepted.	2026-04-30 15:21:18 +02:00
Arnold Wender	4d98b05240	fix(kg): validate ISO-8601 date formats at MCP boundary tool_kg_query (as_of), tool_kg_add (valid_from), and tool_kg_invalidate (ended) accepted any string and forwarded it to SQLite without format validation. Parameterized queries prevent SQL injection, but invalid date strings silently produce empty result sets — callers cannot distinguish "no fact at this time" from "your date format was unrecognized." This is especially painful for natural-language LLM callers that synthesize dates like "March 2026" or "Jan 2025". Add sanitize_iso_date() in config.py alongside the other input validators. It accepts YYYY, YYYY-MM, and YYYY-MM-DD forms; passes through None/empty; and raises ValueError with a field-named message on anything else. Call it from the three kg MCP tool wrappers before values reach the storage layer so the caller gets a clear error instead of a silent miss. Closes #1164	2026-04-30 15:21:17 +02:00
sha2fiddy	db28bf1e84	fix: paginate closet_llm col.get (#1073 ) Mirror the pagination pattern PR #851 landed in miner.py:status(). A single drawers_col.get(limit=total, ...) on palaces larger than SQLite's SQLITE_MAX_VARIABLE_NUMBER (32766) crashes inside chromadb. Fetch drawers in batch_size=5000 chunks, stepping offset until the collection is drained. by_source aggregation semantics are preserved exactly — grouping, wing filter, meta capture all unchanged. Closes #1073. Related: #802, #850, #1016.	2026-04-29 19:01:54 -04:00
Legion345	d7f4638157	fix(storage): stop ChromaDB from crashing when reopening an existing palace	2026-04-28 13:08:04 -07:00
Igor Lins e Silva	fdfaf017ab	Merge pull request #1234 from MemPalace/feat/normalize-gemini-cli feat(normalize): Gemini CLI session JSONL adapter	2026-04-27 20:42:06 -03:00
copilot-swe-agent[bot]	e7fe6cae14	fix(normalize): discard user/gemini turns before session_metadata sentinel Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/4511e9aa-38e7-440e-a6f8-eda91e576f0f Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-27 21:41:48 +00:00
copilot-swe-agent[bot]	a3e3691e86	docs(normalize): add Gemini CLI JSONL to module-level supported formats list Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/a32f48bb-2a78-494a-9698-e69304732d3f Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-27 19:00:18 +00:00
Igor Lins e Silva	4ffd0bd57a	Merge pull request #1233 from MemPalace/feat/privacy-consent-prompt feat(privacy): blocking consent gate for env-fallback LLM API keys	2026-04-27 15:54:11 -03:00
MSL	f4440f1ce0	feat(normalize): Gemini CLI session JSONL adapter Adds a fifth format adapter to mempalace.normalize alongside the existing Claude Code, Codex, Claude.ai, ChatGPT, and Slack parsers. After this lands, mempalace mine --mode convos ingests Gemini CLI session history without manual export. Why now: Claude Code and Codex CLI are already supported by convo_miner; adding Gemini closes the major-CLI-tool coverage gap. After this lands, the README's "verbatim conversation history" promise is honestly delivered for all three top-tier API-keyed coding CLIs (Claude Code, Codex CLI, Gemini CLI), not just two of them. This is the third leg of the trio Aya pushed for so the public claim matches the actual ingest pipeline. Gemini CLI stores sessions at ~/.gemini/tmp/<project_hash>/chats/ as JSONL. The on-disk schema (per google-gemini/gemini-cli#15292): {"type":"session_metadata","sessionId":"...","projectHash":"...",...} {"type":"user","id":"msg1","content":[{"text":"Hello"}]} {"type":"gemini","id":"msg2","content":[{"text":"Hi"}]} {"type":"message_update","id":"msg2","tokens":{"input":10,"output":5}} The new _try_gemini_jsonl parser: - requires a session_metadata record so it does not false-positive against Claude Code or Codex JSONL passing through the dispatch chain in _try_normalize_json - extracts user/gemini message text from each entry's content array of {"text": "..."} blocks, joining multiple blocks per message in order - skips message_update entries (token-count deltas with no message text) and any other unknown record types - returns None when fewer than two conversational messages are present, mirroring the codex parser's >=2-message guard Test coverage: 9 new unit tests in tests/test_normalize.py mirroring the codex test pattern - happy path, multi-turn, missing session metadata, message_update skip, single-message rejection, multi-block content concatenation, empty content skip, malformed-line resilience, and explicit no-match against codex JSONL fixtures. Schema-level only; real Gemini CLI session fixtures are a follow-up once a real user file is available. Closes part of #59 (the Gemini CLI portion of the umbrella request).	2026-04-27 01:25:03 -07:00
MSL	72cbfb5967	feat(privacy): blocking consent gate for env-fallback LLM API keys Adds api_key_source provenance ('flag' \| 'env' \| None) to LLMProvider so cmd_init can distinguish a key passed via --llm-api-key (explicit opt-in) from one silently picked up via OPENAI_API_KEY / ANTHROPIC_API_KEY shell env (stray credential). When the endpoint is external AND api_key_source == 'env', init now prints a blocking [y/N] prompt before any data is sent. Anything other than 'y' drops the LLM and falls back to heuristics-only. Adds --accept-external-llm flag for CI / non-interactive bypass. Completes the UX gap in #1224: the URL-based warning was informational and init kept running, so a user who didn't notice the line had already leaked. The consent prompt is the actual gate; explicit flag-passed keys remain treated as already-consented.	2026-04-27 00:44:57 -07:00
Igor Lins e Silva	de7801ecff	Merge pull request #1191 from funguf/fix/hnsw-index-bloat-rebased fix: prevent HNSW index bloat from resize+persist cycles	2026-04-27 03:37:57 -03:00
Igor Lins e Silva	003e569b39	Merge pull request #1135 from sha2fiddy/feature/max-seq-id-shim-fix fix: narrow `_fix_blob_seq_ids` + add `repair --mode max-seq-id`	2026-04-27 03:21:49 -03:00
Igor Lins e Silva	c3ec708b12	Merge pull request #1197 from wahajahmed010/fix/1194-hyphenated-wing-tunnels fix(tunnels): normalize wing names in topic tunnel lookup for hyphenated dirs	2026-04-27 03:21:16 -03:00
Igor Lins e Silva	f80c9ffa56	Merge pull request #1195 from MemPalace/fix/wing-name-normalization-tunnels fix(graph): normalize wing slug at init so topic tunnels fire for hyphenated dirs (#1194)	2026-04-27 03:20:46 -03:00
igorls	342270d6e5	fix(palace_graph): defer annotation eval for Python 3.9 compat ``def _normalize_wing(wing: str \| None) -> str \| None`` uses PEP 604 union syntax which requires Python 3.10+ at runtime. The project still declares ``python_requires=">=3.9"`` and CI runs the test-linux (3.9) matrix, where every test in ``tests/test_palace_graph*`` errors out before collection with ``TypeError: unsupported operand type(s) for \|``. Added ``from __future__ import annotations`` so all annotations in this module are evaluated lazily as strings — the union syntax is then accepted on 3.9 without needing to rewrite to ``Optional[str]``. Surfaced after rebasing this PR onto current develop.	2026-04-27 03:15:09 -03:00
igorls	cfca40c5ec	test(cli): mock _run_pass_zero so wing-name test survives corpus-origin cmd_init now invokes ``_run_pass_zero`` unconditionally (#1221, #1223 landed on develop after this PR's branch point). The pass reads sample content via ``builtins.open``; with that mocked to MagicMock, the downstream ``"\\n\\n".join(samples)`` in ``corpus_origin.detect_origin_heuristic`` raises ``TypeError: expected str instance, MagicMock found``. This test only cares about the wing-slug write to the registry, so stub the pass-zero call directly rather than try to satisfy its full sample-gathering contract.	2026-04-27 03:14:02 -03:00
Igor Lins e Silva	3bebef1503	fix(miner,convo_miner): close remaining wing-name normalization gaps (#1194 ) Two follow-ups against the review on this PR: 1. ``miner.load_config`` no-yaml fallback was returning the raw dirname as the wing, while ``cmd_init`` writes ``topics_by_wing`` under the normalized slug. A hyphenated project mined without a ``mempalace.yaml`` file silently lost every topic tunnel — same key-miss class as #1194, just down the no-yaml branch (raised by Qodo on this PR). 2. ``convo_miner`` was applying the lower/replace rule inline at one call site. Now folded through ``normalize_wing_name`` so all wing-slug producers — ``cmd_init``, ``room_detector_local``, ``miner.load_config`` fallback, ``convo_miner`` — share a single source of truth. No behavior change for any input; pure consolidation. Added ``test_load_config_no_yaml_normalizes_hyphenated_wing`` to lock the fallback path to the normalized slug — fails on develop without the miner change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 03:12:06 -03:00
bensig	b7f0a8af01	fix(graph): normalize wing slug at init so topic tunnels fire for hyphenated dirs (#1194 ) `init` was recording `topics_by_wing[<raw-dirname>]` while `mempalace.yaml` got the lower-cased separator-collapsed slug. At mine time the miner read the slug from the yaml and missed the registry key, so `_compute_topic_tunnels_for_wing` returned 0 silently for every project whose folder contained a `-` or a space — the most common shape in the wild. Extracted the rule into `config.normalize_wing_name()` and routed both `cli.cmd_init` (registry write) and `room_detector_local.detect_rooms_local` (yaml write) through it. Added a regression test in `test_cli.py` asserting the registry call uses the normalized slug, plus four direct unit tests for the helper. Refs #1180. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 03:12:06 -03:00
Wahaj Ahmed	347464146d	fix(tunnels): normalize wing names in topic tunnel lookup for hyphenated dirs (#1194 )	2026-04-27 03:05:14 -03:00
igorls	04c48dd0fe	fix(chroma): write blob-fix marker even when narrowing skips all rows The narrowed _fix_blob_seq_ids returned early when safe_rows was empty, but #1177's marker contract requires the marker to be written on every successful pass — even no-op — so subsequent opens skip the sqlite3 connection entirely. Without this, palaces that have no genuine 0.6.x BLOBs but DO have sysdb-10-prefixed rows would re-open sqlite3 on every call, defeating the #1090 corruption guard. Restructured the conditional so the marker write is unconditional after a successful sqlite scan, regardless of whether any rows were updated. Surfaced by test_fix_blob_seq_ids_writes_marker_when_already_integer during the develop-rebase of this PR. The author's branch predates the marker contract from #1177 (merged 2026-04-26), so this is a rebase-edge fix-up rather than a logic change to their narrowing behaviour.	2026-04-27 03:01:41 -03:00

1 2 3 4 5 ...

814 Commits