Files
mempalace/mempalace
Igor Lins e Silva 9e7fa1ceb5 feat(init): scan manifests and git authors for real entity signal
`mempalace init` previously leaned entirely on regex-based entity
extraction from prose. That path works for text-only folders but wastes
signal in any codebase: the project's own name is already in
`package.json` / `pyproject.toml` / `Cargo.toml` / `go.mod`, and the
people who worked on it are in `git log`.

This adds `project_scanner.py`, which becomes the primary signal source
when real signal is available, with the regex detector preserved as the
fallback for prose-only folders (diaries, research notes, writing).

What it does:
- Walks the target directory, parses manifests for canonical project
  names, and detects git repos by the presence of a `.git` directory.
- For each repo, reads `git log` for authors and filters obvious bots
  (`[bot]`, `dependabot`, `renovate`, `github-actions`, names ending in
  `bot`, `-autoroll`). Importantly does NOT filter
  `@users.noreply.github.com` - that's GitHub's privacy-protected human
  email, used by real contributors.
- Resolves author aliases with a union-find: commits that share a name
  OR an email collapse into one person. Picks the most-frequent
  real-name variant as display, ignoring handles and single-token
  usernames.
- Flags "mine" projects: user is top-5 committer OR has >=10% of
  commits OR >=20 commits. Ordered by user_commits in the UX.
- `discover_entities()` merges scanner results with the regex detector
  case-insensitively (so `mempalace` from pyproject absorbs `MemPalace`
  from docs), and suppresses the regex `uncertain` bucket when real
  signal is already found - the user doesn't need to adjudicate prose
  noise when the answer is already in git.

Integration: `cmd_init` now calls `discover_entities` instead of
running the regex detector directly. Same output shape, so
`confirm_entities` works unchanged.

Ships with 39 new tests covering manifest parsing, bot filtering,
union-find dedup, git repo discovery, scan integration, and
merge/fallback behavior. Existing 56 regex-detector tests all pass.
2026-04-24 00:20:53 -03:00
..
2026-04-13 18:25:01 -07:00
2026-04-13 18:25:01 -07:00
2026-04-13 18:25:01 -07:00
2026-04-16 10:38:38 +05:00
2026-04-23 16:44:22 -07:00

mempalace/ — Core Package

The Python package that powers MemPalace. All modules, all logic.

Modules

Module What it does
cli.py CLI entry point — routes to mine, search, init, compress, wake-up
config.py Configuration loading — ~/.mempalace/config.json, env vars, defaults
normalize.py Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format
miner.py Project file ingest — scans directories, chunks by paragraph, stores to ChromaDB
convo_miner.py Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content
searcher.py Semantic search via ChromaDB vectors — filters by wing/room, returns verbatim + scores
layers.py 4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search)
dialect.py AAAK compression — entity codes, emotion markers, 30x lossless ratio
knowledge_graph.py Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation
palace_graph.py Room-based navigation graph — BFS traversal, tunnel detection across wings
mcp_server.py MCP server — 19 tools, AAAK auto-teach, Palace Protocol, agent diary
onboarding.py Guided first-run setup — asks about people/projects, generates AAAK bootstrap + wing config
entity_registry.py Entity code registry — maps names to AAAK codes, handles ambiguous names
entity_detector.py Auto-detect people and projects from file content
general_extractor.py Classifies text into 5 memory types (decision, preference, milestone, problem, emotional)
room_detector_local.py Maps folders to room names using 70+ patterns — no API
spellcheck.py Name-aware spellcheck — won't "correct" proper nouns in your entity registry
split_mega_files.py Splits concatenated transcript files into per-session files

Architecture

User → CLI → miner/convo_miner → ChromaDB (palace)
                                     ↕
                              knowledge_graph (SQLite)
                                     ↕
User → MCP Server → searcher → results
                  → kg_query → entity facts
                  → diary    → agent journal

The palace (ChromaDB) stores verbatim content. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool.