Inline comments referencing #1136 and #540 add no information the
identifiers do not already convey. PR description carries the context;
code stays quiet.
Passing a stripped env dict without SYSTEMROOT/WINDIR breaks Python
bootstrap on Windows (_Py_HashRandomization_Init). Inherit the parent
env and strip MEMPAL* vars instead, then override HOME/USERPROFILE to
the tmp dir.
TestKGLazyCache covers the scenarios behind the lazy per-path refactor:
- test_lazy_init_no_import_side_effect: a fresh subprocess import does
not create ~/.mempalace/knowledge_graph.sqlite3 (what closed PR #167
was aiming at).
- test_get_kg_returns_same_instance: two _get_kg() calls under the same
resolved path return the same object, cache has one entry.
- test_get_kg_different_paths_different_instances: rotating env var
produces distinct KGs.
- test_multi_tenant_env_switch: the exact scenario from #1136 — write
under path A, query under path B returns empty, switching back to A
sees the fact.
- test_cache_thread_safe: 16 threads racing _get_kg() end up with one
shared instance and one cache entry.
Direct module-attribute patching of _kg is obsolete after the lazy
cache refactor. Switch test helpers to patch _get_kg instead so the
fixture KG replaces the factory rather than a now-missing singleton.
- tests/test_mcp_server.py: _patch_mcp_server helper
- tests/benchmarks/test_mcp_bench.py: _patch_mcp_config helper
- tests/benchmarks/test_memory_profile.py: inline patch in test_tool_status_repeated_calls
Swap the module-level KnowledgeGraph singleton for a lazy, per-path
cache keyed by the resolved sqlite path. Import no longer creates a
sqlite file as a side effect, and MCP servers started with --palace
now route KG calls to the correct tenant when MEMPALACE_PALACE_PATH
changes between calls, matching the per-call behavior of _get_client()
on the ChromaDB side.
Default-path behavior is preserved: without --palace at startup, KG
stays on DEFAULT_KG_PATH regardless of env var. The "no --palace but
env var set" case is #540's scope and is not changed here.
monkeypatch.delenv(name, raising=False) on a missing key registers no
undo entry, so the env var cmd_init writes leaked into test_config_from_file
on Python 3.13 / Windows / macOS.
Prime the slot with setenv before delenv so teardown rolls back the write.
The repo's anti-jargon meta-test bans §N markers outside the
sources/backends allowlist. mcp_server.py isn't allowlisted, so the
"RFC 002 §5.5" references added in this PR turned the test red.
Trim to "RFC 002" — section number isn't load-bearing for the description.
- Dedup union candidates by (full_path, chunk_index), not basename —
two files sharing a basename in different dirs no longer collide,
and a vector hit on chunk N of a file no longer blocks BM25 from
contributing chunk M of the same file.
- Validate candidate_strategy at the top of search_memories so invalid
values fail consistently, not only when the call routes through the
vector path.
- Trim hits back to n_results after the union+rerank pool grows;
preserves the existing search_memories size contract that the MCP
limit parameter is built on.
- Skip BM25-only injection when max_distance > 0.0; BM25-only
candidates carry distance=None and would silently bypass the
caller's strict vector-distance threshold.
Adds 4 tests covering: validation under vector_disabled, n_results
trim, max_distance honoring, and basename-collision dedup.
The MCP `mempalace_get_drawer` tool returned the entire raw drawer
metadata blob to any connected client, and the `source_file` field
in that blob is the absolute filesystem path written by the miners
(`miner.py`, `convo_miner.py` — `source_file = str(filepath)`). On
a single-user local deployment this is self-disclosure, but in
nested-agent or multi-server MCP topologies the client is a separate
trust domain and the host's directory layout has no documented
client-side use.
Mirror the mitigation that `searcher.search_memories()` already applies
on its own return path: reduce `source_file` to its basename via
`Path(source_file).name` before handing the metadata to the client.
Citations still work — the directory layout does not leak.
Companion to #1 (omit palace_path from tool_status). Same threat class,
different surface:
- mempalace_status — palace dir path → fixed in #1
- mempalace_get_drawer — per-drawer source_file path → this PR
Other read tools were audited and do not leak host paths:
- mempalace_search — already basenames source_file
- mempalace_list_drawers — returns wing/room/preview only
- mempalace_diary_read — date/timestamp/topic/content only
- mempalace_reconnect — success/message/drawers only
- mempalace_kg_* — entity/predicate strings, counts
- mempalace_check_duplicate — wing/room/preview only
Changes:
- mempalace/mcp_server.py: tool_get_drawer() now basenames metadata.source_file
- tests/test_mcp_server.py: regression test asserting the absolute path
and its parent directory do not appear anywhere in the response
- website/reference/mcp-tools.md: clarify the documented return shape
The MCP `mempalace_status` tool was returning the server's absolute
`_config.palace_path` to any connected client on both the main
(ChromaDB-backed) path and the sqlite fallback path that runs when
HNSW divergence is detected (#1222). On a single-user local deployment
this is self-disclosure, but in nested-agent or multi-server MCP
topologies the client is a separate trust domain and the absolute
path has no documented client-side use.
Clients that legitimately need the palace path continue to have three
documented channels: the `MEMPALACE_PALACE_PATH` env var (primary) or
its legacy `MEMPAL_PALACE_PATH` alias, the `~/.mempalace/config.json`
file, and the `--palace` CLI flag on most subcommands.
Also corrects stale docs that claimed `mempalace_reconnect` returned a
`palace_path` field; the code returns `{success, message, drawers,
vector_disabled[, vector_disabled_reason]}` on success, plus a no-palace
shape and an exception shape.
- mempalace/mcp_server.py: drop palace_path from tool_status() and
_tool_status_via_sqlite() result dicts
- website/reference/mcp-tools.md: update documented return shapes for
mempalace_status (fix) and mempalace_reconnect (stale-docs correction)
Authored-by: Aaron Salsitz (ICCI LLC, @icciaaron). Claude Code was used
as an authoring and review-orchestration tool, with human-in-the-loop
oversight at every step: Aaron wrote the prompts, reviewed each draft,
called for three independent review passes (drafting / post-rebase
technical / CISA-aligned disclosure-leak), and verified the final patch
behavior before commit.
* fix(cli): write compress output to mempalace_closets so palace can read them (#1244)
`cmd_compress` was writing AAAK-compressed drawers to a `mempalace_compressed`
collection, but every read path (`palace.get_closets_collection`,
`searcher.py`, `repair.py`) reads from `mempalace_closets`. Result: for
non-mined palaces (or any palace where the user ran `mempalace compress`
expecting to backfill the closet/index layer), the compressed output was
silently invisible — written to a collection nothing else opens.
Fix the writer rather than renaming the readers: "closets" is the
user-visible feature name baked into the public API
(`get_closets_collection`), the searcher hybrid path, repair/HNSW
diagnostics, and docs. Renaming the readers would churn 15+ call sites
and the README for no benefit. The compressed AAAK strings are exactly
what closets are conceptually — compact pointers scanned by an LLM to
locate the right drawer — so they belong in `mempalace_closets`.
Tests:
- Update `test_cmd_compress_stores_results` to assert the collection
name passed to `get_or_create_collection` is `mempalace_closets`.
- Add `test_cmd_compress_output_readable_via_get_closets_collection`:
end-to-end with a real ChromaBackend, seed a drawer, run cmd_compress,
then read back via the same `get_closets_collection` helper that
palace.py / searcher use. Regression test for the wrong-collection
bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* style: ruff format cli.py (#1244)
CI requires ruff format --check on the whole touched file. Pre-existing drift, no logic change.
* style: ruff format tests/test_cli.py (PR #1319)
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`tool_diary_write` stored the `agent` metadata verbatim after `sanitize_name`
(which preserves case), while `tool_diary_read` filtered by exact match —
so writing as "Claude" and reading as "claude" silently returned zero rows.
Both endpoints now lowercase `agent_name` immediately after sanitization.
The default per-agent wing slug is also stable across casings since it's
derived from the same normalized form.
Behavior change: entries written prior to this fix under mixed-case agent
names will not match the new lowercase filter; documented under v3.3.5
in CHANGELOG with a `mempalace repair` pointer.
Adds a regression test (`test_diary_read_case_insensitive_agent`) and
updates the existing `test_diary_write_and_read` to assert the new
lowercase agent identity.
Closes#1243
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1173 wired quarantine_stale_hnsw into the static make_client() helper
but not into the instance _client() method. As a result every non-MCP
entry point (CLI mining, search, repair, status) — which all use
get_collection / _get_or_create_collection / _client() — skipped the
cold-start quarantine pass and could SIGSEGV on a stale HNSW segment
left over from a partial flush, replicated palace, or crashed-mid-write.
Refactor: extract the (_fix_blob_seq_ids + gated quarantine_stale_hnsw)
pre-open pass into a single private static helper
ChromaBackend._prepare_palace_for_open(). Both make_client() and
_client() now route through it, so the _quarantined_paths once-per-
palace-per-process gate is preserved (no runtime thrash on hot paths)
and behaviour stays identical — the fix is purely about extending the
existing protection to the path that was missing it.
Tests:
- test_client_quarantines_corrupt_segment_on_first_open mirrors the
existing make_client test and verifies _client() actually renames a
corrupt segment on first open.
- test_client_quarantines_only_on_first_call_per_palace verifies the
cache gate prevents re-running quarantine across repeated _client()
calls — important because _client() is hit on every backend op.
Closes#1121. Closes#1132. Closes#1263.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cmd_init was instantiating MempalaceConfig() unconditionally, ignoring
args.palace and always writing the palace under ~/.mempalace. Mirror
the env-var pattern used by mcp_server.py (and consistent with how
cmd_mine / cmd_status / cmd_search resolve --palace) so every
downstream read of cfg.palace_path inside cmd_init — Pass 0,
cfg.init(), and the post-init mine — routes to the user-specified
location.
Adds tests/test_cli.py::test_cmd_init_honors_palace_flag covering the
regression: asserts Pass 0 receives the --palace value (not
~/.mempalace) and that MEMPALACE_PALACE_PATH is set in os.environ.
Closes#1313.
`tool_kg_add` previously accepted only `valid_from` and `source_closet`,
silently dropping `valid_to`, `source_file`, and `source_drawer_id` at
the MCP boundary. Backfilling already-ended historical facts therefore
collapsed to "still current," and adapter provenance never reached
the SQLite layer even though `KnowledgeGraph.add_triple` already
supported every column.
`tool_kg_invalidate` returned the literal string `"today"` whenever the
caller omitted `ended`, hiding the actual stamped date from anyone trying
to verify what got persisted.
Changes:
- Extend `tool_kg_add` signature + MCP input_schema with `valid_to`,
`source_file`, `source_drawer_id`; forward all of them to
`_kg.add_triple` and to the WAL log.
- Resolve `ended` to `date.today().isoformat()` in `tool_kg_invalidate`
before logging / returning, so the response always reports the actual
date stored in `valid_to`.
- Add regression tests for valid_to round-trip, source_file /
source_drawer_id provenance, and the resolved-ended-date contract.
- Leave TODO(#1283) markers so the open ISO-8601 validation PR can drop
`validate_iso_date` over `valid_from` / `valid_to` / `ended` cleanly.
The underlying `KnowledgeGraph.add_triple` already accepted these
kwargs (RFC 002 §5.5) — only the MCP edge needed wiring up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`cmd_compress` was writing AAAK-compressed drawers to a `mempalace_compressed`
collection, but every read path (`palace.get_closets_collection`,
`searcher.py`, `repair.py`) reads from `mempalace_closets`. Result: for
non-mined palaces (or any palace where the user ran `mempalace compress`
expecting to backfill the closet/index layer), the compressed output was
silently invisible — written to a collection nothing else opens.
Fix the writer rather than renaming the readers: "closets" is the
user-visible feature name baked into the public API
(`get_closets_collection`), the searcher hybrid path, repair/HNSW
diagnostics, and docs. Renaming the readers would churn 15+ call sites
and the README for no benefit. The compressed AAAK strings are exactly
what closets are conceptually — compact pointers scanned by an LLM to
locate the right drawer — so they belong in `mempalace_closets`.
Tests:
- Update `test_cmd_compress_stores_results` to assert the collection
name passed to `get_or_create_collection` is `mempalace_closets`.
- Add `test_cmd_compress_output_readable_via_get_closets_collection`:
end-to-end with a real ChromaBackend, seed a drawer, run cmd_compress,
then read back via the same `get_closets_collection` helper that
palace.py / searcher use. Regression test for the wrong-collection
bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shell splits hook command on whitespace after variable expansion, breaking
paths with spaces (e.g. C:\Users\Richard M on Windows). Wrapping the path
in double quotes preserves the token boundary.
Fixes the reported Stop/PreCompact pair in .claude-plugin/hooks/hooks.json
and applies the same fix to .codex-plugin/hooks.json (SessionStart/Stop/
PreCompact), which carries the identical bug.
Default search behavior is unchanged. Opt-in candidate_strategy="union"
also pulls top-K BM25-only candidates from sqlite FTS5 and merges them
into the rerank pool, catching docs with strong BM25 signal that the
vector index didn't surface in the over-fetch window.
Motivation
----------
The current hybrid path gathers candidates from the vector index only
(n_results * 3 over-fetch), then BM25-reranks within them. When the
query embeds close to the wrong content semantically, the right doc
never enters the rerank pool — *no matter how wide the over-fetch*.
Tested on a ~6K-document mixed corpus (knowledge prose + short structured
records): at *30x* over-fetch (~5% of the corpus) the target doc still
didn't surface for narrative-shaped queries targeting terminology guides.
Wider over-fetch isn't the answer; widening the pool's *source* is.
Concrete failure mode: a narrative-shaped query embeds close to records
sharing the same operational vocabulary (other narrative entries in the
corpus). A terminology / style guide is BM25-strong for the query
(rare keywords the guide repeats) but vector-distant. Vector-only
candidates don't include it; BM25 never gets to rerank it. The hybrid
path produces 0.00 recall on a probe that pure BM25 alone scores 1.00 —
the hybrid is worse than its component on the same input.
Behavior change
---------------
* New parameter ``candidate_strategy: str = "vector"`` on ``search_memories``.
- ``"vector"`` (default): historical behavior, no change.
- ``"union"``: also fetch top ``n_results * 3`` candidates via the
existing ``_bm25_only_via_sqlite`` helper, dedupe by source_file,
merge into the rerank pool. BM25-only candidates carry
``distance=None`` so they're scored on BM25 contribution alone
(vec_sim coerces to 0).
* ``_hybrid_rank`` now handles ``distance=None`` explicitly, scoring
such candidates as vector-unknown (vec_sim=0) rather than treating
it as max-distance via shim.
* New strategies register via ``_CANDIDATE_MERGERS``; dispatch is in
``_apply_candidate_strategy`` so ``search_memories`` stays under the
C901 complexity ceiling.
Bench numbers (~6K-doc internal mixed corpus, recall@10, 5 probes spanning
policy-exception lookup, temporal-decay, style retrieval, set-difference,
and pattern-recognition):
baseline ("vector") "union"
policy-exception probe 0.00 0.50 +0.50
temporal-decay probe 0.17 0.50 +0.33
style-retrieval probe 0.00 1.00 +1.00 (PASSES)
set-difference probe 0.00–0.06 0.06–0.09 ~
pattern-recog probe 0.64 (stable) 0.50–0.71 variance, typ. +0.07
macro recall 0.16–0.17 0.51–0.56 +0.34 to +0.40
The pattern-recog variance points at a related issue worth a separate PR:
``_hybrid_rank`` computes BM25 IDF over the candidate set. Adding new
candidates re-normalizes BM25 for *existing* candidates non-monotonically.
Stable corpus-wide BM25 would remove this. Out of scope here.
Tests
-----
``tests/test_hybrid_candidate_union.py`` (6 tests, all pass):
- default behavior unchanged (explicit ``"vector"`` matches default)
- ``"union"`` surfaces a BM25-strong vector-distant doc
- ``"union"`` doesn't drop docs ``"vector"`` would have found
- empty-palace handling
- invalid ``candidate_strategy`` raises
- ``_hybrid_rank`` tolerates ``distance=None``
Existing ``test_hybrid_search.py`` (5) and ``test_searcher.py`` (27) pass.
Performance note
----------------
Each ``"union"`` query adds one sqlite open + FTS5 MATCH + metadata
fetch (via the existing ``_bm25_only_via_sqlite`` helper, which already
runs as the ``vector_disabled`` fallback path so the code is
well-trodden). Per-query overhead is small but unmeasured at corpus
scale. Default stays ``"vector"`` until a maintainer characterizes the
cost.
- Resolve the EF inside the two reopen branches that actually call
`client.get_collection` / `client.create_collection`, so warm-cache
reads stay zero-cost (no `MempalaceConfig()` / `_resolve_providers`
on every tool call).
- Reuse `ChromaBackend._resolve_embedding_function()` instead of
duplicating its try/except + log message + None-fallback.
- Reword the inline + CHANGELOG explanation to clarify that ChromaDB 1.x
persists the EF *identity* (its `name()`) but not the *instance/
configuration* — `mempalace.embedding` documents this and spoofs
`name()` to `"default"` precisely so the identity check passes; the
bug was the *provider list* (lazy ONNX selection) silently differing.
`mcp_server._get_collection` bypassed `ChromaBackend.get_collection`
and called `client.get_collection` / `client.create_collection` without
`embedding_function=`. ChromaDB 1.x does not persist the EF identity
with the collection, so the MCP server's reopen silently bound
chromadb's built-in `DefaultEmbeddingFunction` while the miner / Stop
hook ingest path bound `mempalace.embedding.get_embedding_function()`.
On bleeding-edge interpreters (python 3.14 + chromadb 1.5.x on Apple
Silicon, per #1299) the default EF's lazy ONNX provider selection could
SIGSEGV the host process on first `col.add()`, killing the MCP stdio
server and leaving every subsequent tool call returning
`Connection closed` until Claude Code was relaunched. Reads worked
because `col.get(ids=...)` and metadata fetches don't invoke the EF;
the auto-ingest path worked because mining routes through the backend
abstraction. Diary writes were the consistent failure surface.
Resolve the EF up front (matching `ChromaBackend._resolve_embedding_function`)
and pass it into both reopen branches. Falls back to the chromadb default
only if `mempalace.embedding.get_embedding_function` itself raises.
Regression test patches the chromadb client class to capture
`embedding_function=` on every `get_collection` / `create_collection`
call from `_get_collection(create=True)` and `_get_collection()`, and
fails if any call omits it.
Follow-up to #1262 / #1289 (which fixed the metadata-mismatch SIGSEGV
path); this addresses the EF-mismatch SIGSEGV path on the same surface.
#1262 split `get_or_create_collection` into `get_collection` + fallback
`create_collection` inside `ChromaBackend.get_collection`, fixing the
chromadb 1.5.x Rust-binding SIGSEGV that fires when stored collection
metadata differs from the call-site's `_HNSW_BLOAT_GUARD` payload.
The MCP server's `_get_collection(create=True)` carries the same metadata
payload at `mcp_server.py:287` and routes through chromadb's Python
client directly, bypassing the backend layer. Both `tool_add_drawer`
and `tool_diary_write` reach this site on every invocation, and the
Stop hook fires `mempalace_diary_write` at session end — which was
exactly the crash path #1089 named.
Apply the same try/except split here so legacy palaces whose stored
metadata predates the bloat-guard expansion no longer crash on the
MCP-server reopen path. Regression test patches
`get_or_create_collection` at the chromadb client class level (not the
instance — chromadb's mtime-change detection rebuilds the client between
calls, so an instance-level spy doesn't survive) and asserts the second
`_get_collection(create=True)` call never reaches it.
`_compute_heuristic_seq_id` ran `int(row[0])` directly on the result
of `MAX(e.seq_id)`. On palaces where chromadb 1.5.x has been writing
seq_ids natively (8-byte big-endian uint64 BLOB), that raises
`ValueError: invalid literal for int() with base 10: b'...'` before
the dry-run can print, leaving users with no path through the
recovery feature added in #1135 — the only documented un-poison
route for palaces hit by the original PR #664 shim bug.
Decode BLOB return values via `int.from_bytes(val, "big")` and
keep the existing `int(val)` path for INTEGER rows. Regression
test seeds a BLOB row in `embeddings.seq_id` and asserts the
heuristic surfaces the correct integer.
The capacity probe added in #1227 hardcoded a 2,000-row floor for the
"diverged" decision. The comment justifying that number explicitly tied
it to chromadb's *default* sync_threshold of 1,000 — "Two synchronization
windows worth (2 × sync_threshold = 2000) is a safe steady-state ceiling".
#1191 then bumped sync_threshold to 50,000 via _HNSW_BLOAT_GUARD without
updating the floor. Result: any palace created with the bloat guard
flips between OK and DIVERGED on every flush cycle. Steady-state
divergence sits at 0–50K (the natural queue depth), and the 2,000 floor
trips the guardrail the moment the queue exceeds 10% of sqlite_count.
The MCP server then routes search to BM25-only and disables duplicate
detection for ~80% of the write cycle on actively-mined ≥100K palaces,
even though chromadb is behaving correctly.
This change reads the configured `hnsw:sync_threshold` from
`collection_metadata` per palace and scales the floor to 2 × that value.
The 10% relative term and the original #1222 detection capability are
unchanged — a 91%-missing-of-192K palace (the actual #1222 reproducer)
still trips, regardless of whether the collection was created with
sync_threshold=1000 or 50000.
Behavior summary:
| Collection's sync_threshold | New floor | Old floor |
|---|---|---|
| Missing (legacy palace) | 2000 | 2000 (unchanged)
| 1000 (chromadb default) | 2000 | 2000 (unchanged)
| 50000 (#1191 bloat guard) | 100000 | 2000 (the bug)
Tests:
- test_capacity_status_tolerates_lag_under_large_sync_threshold (regression
for the #1191/#1227 conflict — 100K sqlite + 50K HNSW + sync=50K → OK)
- test_capacity_status_still_flags_real_corruption_under_large_sync (#1222
shape with bloat-guard collection — still detects corruption)
- test_capacity_status_default_threshold_when_no_sync_metadata (legacy
palaces without the metadata row use the 2000 fallback floor)
- test_unflushed_path_also_uses_dynamic_floor (the never-flushed branch
scales too — 30K under sync_threshold=50000 is no longer flagged)
All 18 pre-existing tests in tests/test_hnsw_capacity.py and 45 tests
in tests/test_backends.py still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a fifth format adapter to mempalace.normalize alongside the
existing Claude Code, Codex, Claude.ai, ChatGPT, and Slack parsers.
After this lands, mempalace mine --mode convos ingests Gemini CLI
session history without manual export.
Why now: Claude Code and Codex CLI are already supported by convo_miner;
adding Gemini closes the major-CLI-tool coverage gap. After this lands,
the README's "verbatim conversation history" promise is honestly
delivered for all three top-tier API-keyed coding CLIs (Claude Code,
Codex CLI, Gemini CLI), not just two of them. This is the third leg
of the trio Aya pushed for so the public claim matches the actual
ingest pipeline.
Gemini CLI stores sessions at ~/.gemini/tmp/<project_hash>/chats/ as
JSONL. The on-disk schema (per google-gemini/gemini-cli#15292):
{"type":"session_metadata","sessionId":"...","projectHash":"...",...}
{"type":"user","id":"msg1","content":[{"text":"Hello"}]}
{"type":"gemini","id":"msg2","content":[{"text":"Hi"}]}
{"type":"message_update","id":"msg2","tokens":{"input":10,"output":5}}
The new _try_gemini_jsonl parser:
- requires a session_metadata record so it does not false-positive
against Claude Code or Codex JSONL passing through the dispatch
chain in _try_normalize_json
- extracts user/gemini message text from each entry's content array
of {"text": "..."} blocks, joining multiple blocks per message
in order
- skips message_update entries (token-count deltas with no message
text) and any other unknown record types
- returns None when fewer than two conversational messages are
present, mirroring the codex parser's >=2-message guard
Test coverage: 9 new unit tests in tests/test_normalize.py mirroring
the codex test pattern - happy path, multi-turn, missing session
metadata, message_update skip, single-message rejection, multi-block
content concatenation, empty content skip, malformed-line resilience,
and explicit no-match against codex JSONL fixtures. Schema-level only;
real Gemini CLI session fixtures are a follow-up once a real user file
is available.
Closes part of #59 (the Gemini CLI portion of the umbrella request).
Adds api_key_source provenance ('flag' | 'env' | None) to LLMProvider
so cmd_init can distinguish a key passed via --llm-api-key (explicit
opt-in) from one silently picked up via OPENAI_API_KEY / ANTHROPIC_API_KEY
shell env (stray credential).
When the endpoint is external AND api_key_source == 'env', init now
prints a blocking [y/N] prompt before any data is sent. Anything other
than 'y' drops the LLM and falls back to heuristics-only.
Adds --accept-external-llm flag for CI / non-interactive bypass.
Completes the UX gap in #1224: the URL-based warning was informational
and init kept running, so a user who didn't notice the line had already
leaked. The consent prompt is the actual gate; explicit flag-passed keys
remain treated as already-consented.