Files

T

Igor Lins e Silva 6b7dcc53d4 merge: pr/closet-llm-generic + harden LLM regen path for production

Brings in PR #793 (optional LLM-based closet regeneration via
user-configured OpenAI-compatible endpoint) and PR #795 (hybrid
closet+drawer search — closets boost, never gate). Stack: #784 → #788
→ #789 → #790 → #791 → #792 → #793 (+ #795).

Findings hardened on our side
─────────────────────────────

1) closet_llm.regenerate_closets didn't use the blessed palace helpers.

   Before:
     * manual closets_col.get(where=...) + .delete(ids=...) with a
       silent ``except Exception: pass`` around both — if the purge
       failed, pre-existing regex closets survived alongside fresh LLM
       closets, giving the searcher double hits for the same source.
     * ``source.split('/')[-1][:30]`` to build the closet_id — quietly
       wrong on Windows paths (``C:\\proj\\a.md`` has no ``/``, so the
       whole string ends up in the ID).
     * no mine_lock around purge+upsert — a concurrent regex rebuild of
       the same source could interleave with our purge and leave a mix
       of regex and LLM pointers.
     * no ``normalize_version`` stamp on the LLM closets — the miner's
       stale-version gate would treat them as leftovers from an older
       schema and rebuild over them on the next mine.

   After: routes through ``purge_file_closets`` + ``mine_lock`` +
   ``os.path.basename`` + ``NORMALIZE_VERSION`` stamp. Regression tests
   cover each.

2) searcher.search_memories was still closet-first.

   PR #795 merged into #793's head to fix the recall regression
   documented in that PR (R@1 0.25 on narrative content vs. 0.42
   baseline). The hybrid design makes closets a ranking boost rather
   than a gate: drawers are always queried at the floor, and matching
   closet hits (rank 0-4 within CLOSET_DISTANCE_CAP=1.5) add a boost
   of 0.40/0.25/0.15/0.08/0.04 to the effective distance.

   Merged to take the incoming hybrid design, with two cleanups:
   * kept the ``_expand_with_neighbors`` / ``_extract_drawer_ids_from_closet``
     helpers as separately-tested utilities (still imported by tests
     and future callers);
   * replaced the fragile ``source_file.endswith(basename)`` reverse-
     lookup in the enrichment step with internal ``_source_file_full``
     / ``_chunk_index`` fields stripped before return, so enrichment
     doesn't silently pick the wrong path when two sources share a
     basename across directories;
   * drawer-grep enrichment now sorts by ``chunk_index`` before
     neighbor expansion, so ``best_idx ± 1`` corresponds to actual
     document order rather than whatever order Chroma returned.

3) Closet-first tests in test_closets.py (``TestSearchMemoriesClosetFirst``,
   end-to-end ``test_closet_first_search_includes_drawer_index_and_total``)
   pinned contracts that the hybrid path now violates (``matched_via``
   went from ``"closet"`` to ``"drawer+closet"``). Rewrote them around
   the new invariant: direct drawers are always the floor, closet
   agreement flips the hit's matched_via and exposes closet_preview.

Verification
────────────

* 805/805 pass under ``uv run pytest tests/ -v --ignore=tests/benchmarks``
  (13 new tests from PR #793 + 5 from PR #795 + 2 new regressions for
  the closet_llm hardening + the rewritten hybrid assertions in
  test_closets.py).
* CI-pinned ruff 0.4.x clean on ``mempalace/`` + ``tests/`` (check +
  format both pass).
* No new deps — closet_llm.py still uses stdlib ``urllib.request`` per
  the PR's "zero new dependencies" promise.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

2026-04-13 18:40:36 -03:00

backends

fix: auto-repair BLOB seq_ids from chromadb 0.6→1.5 migration (#664 )

2026-04-11 23:06:01 -07:00

i18n

fix(ci): resolve ruff lint + format failures

2026-04-12 17:14:06 -03:00

instructions

fix: add --yes flag to init instructions for non-interactive use (#534 ) (#682 )

2026-04-12 14:23:29 -07:00

__init__.py

fix: remove no-op ORT_DISABLE_COREML env var (#397 ) (#653 )

2026-04-11 23:05:56 -07:00

__main__.py

MemPalace: palace architecture, AAAK compression, knowledge graph

2026-04-04 18:16:04 -07:00

cli.py

fix: address Copilot review comments on PR #739

2026-04-12 23:07:46 -03:00

closet_llm.py

merge: pr/closet-llm-generic + harden LLM regen path for production

2026-04-13 18:40:36 -03:00

config.py

fix: allow Unicode in sanitize_name() — Latvian, CJK, Cyrillic (#637 ) (#683 )

2026-04-12 14:23:34 -07:00

convo_miner.py

merge: develop (#784 file-locking, #820 version sync)

2026-04-13 16:29:50 -03:00

dedup.py

style: ruff format

2026-04-10 08:49:35 -07:00

dialect.py

fix(ci): resolve ruff lint + format failures

2026-04-12 17:14:06 -03:00

diary_ingest.py

merge: develop + harden entity metadata, BM25, and diary ingest for production

2026-04-13 17:37:45 -03:00

entity_detector.py

fix: correct typo in entity_detector interactive classification prompt (#755 )

2026-04-13 01:43:57 -03:00

entity_registry.py

test: bring coverage to 85%, set threshold to 85, reset version to 3.0.11

2026-04-08 21:38:12 +03:00

exporter.py

style: ruff format all Python files (#675 )

2026-04-11 22:59:34 -07:00

fact_checker.py

merge: full hardened stack + rewrite fact_checker around actual KG API

2026-04-13 18:20:11 -03:00

general_extractor.py

MemPalace: palace architecture, AAAK compression, knowledge graph

2026-04-04 18:16:04 -07:00

hooks_cli.py

Merge branch 'main' into fix/issue-347-codex-hook-message-counting

2026-04-10 09:23:37 -07:00

instructions_cli.py

feat: add MemPalace Claude Code plugin with hooks and instructions

2026-04-08 14:55:46 +03:00

knowledge_graph.py

Security hardening: consistent input validation, argument whitelisting, concurrency safety, and WAL fixes (#647 )

2026-04-11 20:44:17 -07:00

layers.py

style: ruff format all Python files (#675 )

2026-04-11 22:59:34 -07:00

mcp_server.py

merge: full hardened stack + rewrite fact_checker around actual KG API

2026-04-13 18:20:11 -03:00

migrate.py

fix: address Copilot review comments on PR #739

2026-04-12 23:07:46 -03:00

miner.py

merge: full hardened stack + rewrite fact_checker around actual KG API

2026-04-13 18:20:11 -03:00

normalize.py

fix(normalize): make strip_noise verbatim-safe and scope it to Claude Code JSONL

2026-04-13 16:11:03 -03:00

onboarding.py

test: add comprehensive test coverage (35% → 58%, threshold 50%)

2026-04-08 20:54:56 +03:00

palace_graph.py

merge: develop + harden cross-wing tunnels for production

2026-04-13 17:50:43 -03:00

palace.py

merge: develop + harden closet layer for production

2026-04-13 17:00:55 -03:00

py.typed

chore: tighten chromadb version range and add py.typed marker

2026-04-07 18:51:42 -03:00

query_sanitizer.py

fix: address Copilot review comments on PR #739

2026-04-12 23:07:46 -03:00

README.md

MemPalace: palace architecture, AAAK compression, knowledge graph

2026-04-04 18:16:04 -07:00

repair.py

style: ruff format

2026-04-10 08:49:35 -07:00

room_detector_local.py

fix: skip unreachable reparse points in detect_rooms_from_folders (#558 )

2026-04-11 16:16:06 -07:00

searcher.py

merge: pr/closet-llm-generic + harden LLM regen path for production

2026-04-13 18:40:36 -03:00

spellcheck.py

MemPalace: palace architecture, AAAK compression, knowledge graph

2026-04-04 18:16:04 -07:00

split_mega_files.py

fix: expand ~ in split command directory argument (#361 )

2026-04-11 23:14:28 -07:00

version.py

fix: sync version.py to 3.2.0

2026-04-13 15:46:27 -03:00

README.md

mempalace/ — Core Package

The Python package that powers MemPalace. All modules, all logic.

Modules

Module	What it does
`cli.py`	CLI entry point — routes to mine, search, init, compress, wake-up
`config.py`	Configuration loading — `~/.mempalace/config.json`, env vars, defaults
`normalize.py`	Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format
`miner.py`	Project file ingest — scans directories, chunks by paragraph, stores to ChromaDB
`convo_miner.py`	Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content
`searcher.py`	Semantic search via ChromaDB vectors — filters by wing/room, returns verbatim + scores
`layers.py`	4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search)
`dialect.py`	AAAK compression — entity codes, emotion markers, 30x lossless ratio
`knowledge_graph.py`	Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation
`palace_graph.py`	Room-based navigation graph — BFS traversal, tunnel detection across wings
`mcp_server.py`	MCP server — 19 tools, AAAK auto-teach, Palace Protocol, agent diary
`onboarding.py`	Guided first-run setup — asks about people/projects, generates AAAK bootstrap + wing config
`entity_registry.py`	Entity code registry — maps names to AAAK codes, handles ambiguous names
`entity_detector.py`	Auto-detect people and projects from file content
`general_extractor.py`	Classifies text into 5 memory types (decision, preference, milestone, problem, emotional)
`room_detector_local.py`	Maps folders to room names using 70+ patterns — no API
`spellcheck.py`	Name-aware spellcheck — won't "correct" proper nouns in your entity registry
`split_mega_files.py`	Splits concatenated transcript files into per-session files

Architecture

User → CLI → miner/convo_miner → ChromaDB (palace)
                                     ↕
                              knowledge_graph (SQLite)
                                     ↕
User → MCP Server → searcher → results
                  → kg_query → entity facts
                  → diary    → agent journal

The palace (ChromaDB) stores verbatim content. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool.