mempalace

Author	SHA1	Message	Date
mvalentsev	03643eb507	fix(cli, fact-checker): per-stream stdio errors policy on Windows Previously all three streams reconfigured to UTF-8 with errors='strict'. That kills 'mempalace search' the moment a drawer carrying a surrogate half (round-tripped from a filename via surrogateescape) hits print(), losing the rest of the result block. Same hazard for warning lines on stderr. Split the policy: stdin -> surrogateescape (malformed bytes from a redirected file survive as lone surrogates instead of crashing the read) stdout -> replace (drawer text with a stray surrogate becomes U+FFFD instead of UnicodeEncodeError mid-print) stderr -> replace (same protection for logger / warning paths) Applied identically in the cli.py and fact_checker.py helpers; the DRY extraction into a shared module is a separate cleanup ask, kept out of this fix to keep the diff narrow. Tests updated for the new per-stream assertion.	2026-05-03 21:37:12 +05:00
mvalentsev	32f4dfa26d	fix(cli): reconfigure stdio to UTF-8 on Windows The primary `mempalace` console_script (`cli.py:main()`) reads non-ASCII arguments via piped stdin and writes verbatim drawer text / wing names through `print()`. On Windows, Python defaults stdio to the system ANSI codepage (cp1252/cp1251/cp950), so: - `mempalace search "..." > out.txt` mojibakes any drawer text containing non-Latin characters - `mempalace ... < input.txt` mojibakes piped non-ASCII input Reconfigure stdin/stdout/stderr to UTF-8 (`errors="strict"`) at the top of `main()`, mirroring the helper added in this PR for fact_checker's `__main__` block. Wrapped in try/except so a replaced stream (Jupyter, test harness) logs a warning and continues rather than crashing the CLI. The reconfigure cascades through every `mempalace` subcommand (`init`/`mine`/`search`/`status`/`hook`/etc.) and through the interactive flows that read non-ASCII names via `input()` (onboarding, entity detector, room detector). With this commit the package's three user-facing entry points (`mempalace`, `mempalace-mcp`, and `python -m mempalace.fact_checker`) all reconfigure stdio identically on Windows.	2026-05-03 21:33:54 +05:00
Igor Lins e Silva	1888b671e2	Merge pull request #1321 from MemPalace/fix/1313-init-palace-flag fix(cli): honor --palace flag in cmd_init (#1313)	2026-05-03 03:54:06 -03:00
Igor Lins e Silva	a91b7ee5c2	test(cli): prime monkeypatch undo so palace env doesn't leak monkeypatch.delenv(name, raising=False) on a missing key registers no undo entry, so the env var cmd_init writes leaked into test_config_from_file on Python 3.13 / Windows / macOS. Prime the slot with setenv before delenv so teardown rolls back the write.	2026-05-03 06:27:37 -03:00
igorls	2857948c1e	style: ruff format tests/test_cli.py (PR #1319 )	2026-05-02 23:00:07 -03:00
Igor Lins e Silva	01b3183e5d	fix(cli): honor --palace flag in cmd_init (#1313 ) cmd_init was instantiating MempalaceConfig() unconditionally, ignoring args.palace and always writing the palace under ~/.mempalace. Mirror the env-var pattern used by mcp_server.py (and consistent with how cmd_mine / cmd_status / cmd_search resolve --palace) so every downstream read of cfg.palace_path inside cmd_init — Pass 0, cfg.init(), and the post-init mine — routes to the user-specified location. Adds tests/test_cli.py::test_cmd_init_honors_palace_flag covering the regression: asserts Pass 0 receives the --palace value (not ~/.mempalace) and that MEMPALACE_PALACE_PATH is set in os.environ. Closes #1313.	2026-05-02 22:56:31 -03:00
Igor Lins e Silva	cbd6e5d65d	fix(cli): write compress output to mempalace_closets so palace can read them (#1244 ) `cmd_compress` was writing AAAK-compressed drawers to a `mempalace_compressed` collection, but every read path (`palace.get_closets_collection`, `searcher.py`, `repair.py`) reads from `mempalace_closets`. Result: for non-mined palaces (or any palace where the user ran `mempalace compress` expecting to backfill the closet/index layer), the compressed output was silently invisible — written to a collection nothing else opens. Fix the writer rather than renaming the readers: "closets" is the user-visible feature name baked into the public API (`get_closets_collection`), the searcher hybrid path, repair/HNSW diagnostics, and docs. Renaming the readers would churn 15+ call sites and the README for no benefit. The compressed AAAK strings are exactly what closets are conceptually — compact pointers scanned by an LLM to locate the right drawer — so they belong in `mempalace_closets`. Tests: - Update `test_cmd_compress_stores_results` to assert the collection name passed to `get_or_create_collection` is `mempalace_closets`. - Add `test_cmd_compress_output_readable_via_get_closets_collection`: end-to-end with a real ChromaBackend, seed a drawer, run cmd_compress, then read back via the same `get_closets_collection` helper that palace.py / searcher use. Regression test for the wrong-collection bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 22:54:14 -03:00
igorls	cfca40c5ec	test(cli): mock _run_pass_zero so wing-name test survives corpus-origin cmd_init now invokes ``_run_pass_zero`` unconditionally (#1221, #1223 landed on develop after this PR's branch point). The pass reads sample content via ``builtins.open``; with that mocked to MagicMock, the downstream ``"\\n\\n".join(samples)`` in ``corpus_origin.detect_origin_heuristic`` raises ``TypeError: expected str instance, MagicMock found``. This test only cares about the wing-slug write to the registry, so stub the pass-zero call directly rather than try to satisfy its full sample-gathering contract.	2026-04-27 03:14:02 -03:00
bensig	b7f0a8af01	fix(graph): normalize wing slug at init so topic tunnels fire for hyphenated dirs (#1194 ) `init` was recording `topics_by_wing[<raw-dirname>]` while `mempalace.yaml` got the lower-cased separator-collapsed slug. At mine time the miner read the slug from the yaml and missed the registry key, so `_compute_topic_tunnels_for_wing` returned 0 silently for every project whose folder contained a `-` or a space — the most common shape in the wild. Extracted the rule into `config.normalize_wing_name()` and routed both `cli.cmd_init` (registry write) and `room_detector_local.detect_rooms_local` (yaml write) through it. Added a regression test in `test_cli.py` asserting the registry call uses the normalized slug, plus four direct unit tests for the helper. Refs #1180. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 03:12:06 -03:00
MSL	b99e54546b	feat(init): context-aware corpus detection 10 files changed. 2,563 insertions, 30 deletions. 48 new tests, including end-to-end coverage live-tested with Anthropic Haiku 4.5. This PR overhauls the first-run experience of `mempalace init` end-to-end, ships a new corpus-origin detection module from scratch, wires it into entity classification and LLM refinement, adds a graceful-fallback path that means `init` never crashes on a missing LLM, and ships a meta-test that prevents internal-coordination jargon from leaking into source or tests. The headline change is that `mempalace init` now understands what kind of folder you're pointing it at — AI conversations, regular writing, code, narrative — and adapts how it classifies entities accordingly. The same folder containing `Echo`, `Sparrow`, and `Cipher` (names you've assigned to AI agents) used to dump those into your "people" list alongside biological humans. Now they go into a separate `agent_personas` bucket, and your `people` list stays clean. But the broader change is that `mempalace init` got upgraded across the board — smarter defaults, smarter degradation, smarter classification, smarter persistence, and a new way to refresh as your folder grows. Built and live-verified with Anthropic Haiku 4.5; runs unmodified on the local LLM runtimes mempalace already supports. ## What changes for users (in order, from `pip install` onwards) Install — `pip install mempalace` is unchanged. The package itself didn't shift. First run — `mempalace init <folder>`: 1. `init` examines your folder before classifying anything. A free regex heuristic decides in milliseconds: AI conversations, regular writing, narrative, or code? If an LLM is reachable, a second pass extracts the corpus author's name and any agent persona names from the dialogue. v3.3.3 had no such step — it dove straight into entity detection with no corpus context. 2. LLM-assisted classification is now ON by default. v3.3.3 made `--llm` opt-in. The LLM-assisted path is qualitatively better (extracts persona names, refines ambiguous classifications, gives the model corpus context) so it now runs by default. The provider abstraction is unchanged from v3.3.3 — three buckets are supported by `mempalace.llm_client`: - Anthropic (`--llm-provider anthropic` + `ANTHROPIC_API_KEY`) — the official Messages API. This is the path live-verified end-to-end in this PR with Haiku 4.5. Cost: ~\$0.01 per `init`. - Ollama (`--llm-provider ollama` — the default) — local models via `http://localhost:11434`. Fully offline. Honors the "zero-API required" promise. - OpenAI-compatible (`--llm-provider openai-compat` + `--llm-endpoint`) — per the v3.3.3 `mempalace/llm_client.py` docstring, this covers "OpenRouter, LM Studio, llama.cpp server, vLLM, Groq, Fireworks, Together, and most self-hosted setups." We did not test each of those individually as part of this PR; the abstraction has been stable since v3.3.3. If you try this PR with a specific provider and hit a quirk, please file an issue or comment here. 3. `init` never blocks on a missing LLM. No Ollama running, no API key set? `init` prints a one-line message pointing at `--no-llm` and falls through to the heuristic-only path. New default behavior, new graceful fallback to support it. `--no-llm` is the new explicit opt-out. 4. `init` shows you what it detected. A one-line banner — `Detected: Claude (Anthropic) (user: Jordan, agents: Echo, Sparrow, Cipher)` or `Corpus origin: not AI-dialogue (confidence: 0.98)` — tells you at a glance whether mempalace understood your folder. 5. Entity classification gets smarter across the board. Even non-persona candidates benefit: the LLM has corpus context (this is AI-dialogue, this is the user's name, these are agent names) and uses it to disambiguate ambiguous candidates that aren't personas at all. 6. Agent personas live in their own bucket. Names you've assigned to AI agents (Echo, Sparrow, Cipher) go into a new `agent_personas` bucket instead of your `people` list. Your real-person entity list stays clean. 7. Detection result persists to `<palace>/.mempalace/origin.json` with a `schema_version: 1` envelope, so downstream tools can read it. 8. Re-running `init` is now idempotent. Bug fix — running `init` twice on the same folder used to give different classification results because the detection step was sampling its own `entities.json` output. Caught by integration testing during this PR. Later — when your folder grows: 9. `mempalace mine --redetect-origin` is a new flag for refreshing the stored detection without redoing the whole `init`. Heuristic-only by design (the flag is meant to be cheap). If you want the full LLM-extracted detection refreshed (persona names, user name, etc.), run `mempalace init <yourfolder>` again — `init` is now idempotent (item 8), so re-running it on the same folder is safe. ## Behind the changes - New module `mempalace/corpus_origin.py` (422 lines) with two-tier detection: regex heuristic with co-occurrence rule (suppresses ambiguous terms like `Claude` / `Gemini` / `Haiku` when no unambiguous AI signal is present, so French novels, astrology forums, poetry corpora, llama-rancher journals don't false-positive), and LLM tier that extracts `user_name` and `agent_persona_names` from dialogue structure with belt-and-suspenders user-vs-agent disambiguation. - Entity-classification consumer wiring. `entity_detector.detect_entities` and `project_scanner.discover_entities` accept an optional `corpus_origin` kwarg. When present and the corpus is identified as AI-dialogue, candidates whose name case-insensitively matches an `agent_persona_name` are routed into the `agent_personas` bucket instead of `people`. Per-entity `type` is rewritten to `"agent_persona"`. - LLM-refine consumer wiring. `llm_refine.refine_entities` accepts the same `corpus_origin` kwarg and prepends a `CORPUS CONTEXT` preamble to its system prompt giving the LLM the platform / user / persona context. Existing `TOPIC` / `PERSON` / `PROJECT` / `COMMON_WORD` / `AMBIGUOUS` labels are unchanged. - `init` overhaul. Pass 0 (corpus-origin detection) inserted before existing Pass 1 (entity discovery). `--llm` flipped to default-on. `--no-llm` added. Graceful-fallback path replaces the previous hard-error on missing LLM. Provider precedence unchanged from the existing `llm_client` module. - `mine` flag. `mempalace mine --redetect-origin` re-runs corpus-origin detection on the current corpus state and overwrites `<palace>/.mempalace/origin.json`. - `CLAUDE.md` design principle reworded — "Local-first, zero external API by default." Local LLMs running on `localhost` (Ollama, LM Studio, llama.cpp, vLLM, unsloth studio) are part of the user's machine, not external APIs. External BYOK providers (Anthropic, OpenAI, Google) are supported but always opt-in, never default, never silent fallback. ## Cost story - Anthropic (verified path): ~\$0.01 per `init` via Haiku 4.5 with `ANTHROPIC_API_KEY`. - Ollama / local LLM runtime: zero cost. Fully offline. - OpenAI-compatible service: depends entirely on the service. The abstraction supports any service speaking the standard `/v1/chat/completions` API; specific quirks vary per provider. Try it and tell us how it goes. - No LLM at all: graceful fallback to heuristic-only. Zero cost. `init` never blocks. ## Backwards compatibility - All public function signatures gained the `corpus_origin` kwarg as optional (default `None`). Callers that don't pass it see the v3.3.3 return shape unchanged — no `agent_personas` key, no behavioral change. - The `--llm` CLI flag is preserved as a deprecated alias of the default. Existing scripts that pass it continue to work. - `corpus_origin=None` keeps `llm_refine.SYSTEM_PROMPT` byte-identical to v3.3.3. ## Test coverage - 19 unit tests in `tests/test_corpus_origin.py` covering both tiers, the co-occurrence rule, ambiguous-term suppression, word-boundary brand matching, and user/persona disambiguation. - 29 integration tests in `tests/test_corpus_origin_integration.py` covering end-to-end through `mempalace init`, persona reclassification, the `--redetect-origin` flag, the `--llm` default flip, graceful fallback paths, and re-init idempotency. Of those 29, five specifically cover the intersection with develop's other in-flight work (Pass 0 ↔ auto-mine ordering, topics + agent_personas bucket coexistence, entities.json shape, the `wing=` kwarg threading, llm_refine TOPIC label + corpus_origin preamble composition). - 1354 total mempalace tests pass. 2 pre-existing environmental failures (`test_mcp_stdio_protection` — chromadb optional dep) unrelated to this change; they fail on plain `develop` too. - Live-smoke-tested with real Anthropic Haiku 4.5 on AI-dialogue and narrative fixtures. ## Hygiene guardrail This PR also adds a meta-test (`test_no_internal_coordination_jargon_in_source_or_tests`) that walks the source tree and asserts no internal-coordination jargon (e.g. development-phase markers, internal review-section references) leaks into runtime code, comments, docstrings, or LLM prompts. RED if anything slips in. Allowlist for legitimate RFC/spec section citations in `sources/`, `backends/`, `knowledge_graph.py`, and `i18n/`.	2026-04-26 12:37:26 -07:00
Igor Lins e Silva	c4eeec8642	test: use shlex.quote in resume-hint assertions for Windows The pre-existing test_maybe_run_mine_prompt_declined_prints_hint asserted the bare unquoted form `mempalace mine {tmp_path}`. After the production code switched to shlex.quote on the resume hint, this passed on Linux/macOS (POSIX paths have no characters that trigger quoting) but failed on Windows where backslashes always get wrapped in single quotes. Mirror the production code in the assertion via shlex.quote so it's portable across platforms; do the same for the two new spaces-in-path tests for consistency.	2026-04-25 01:18:31 -03:00
Igor Lins e Silva	8faf0042b5	fix(cli,mine): shell-quote project_dir in resume hints The "Skipped. Run mempalace mine <dir>" hint after declining the init prompt and the "Re-run mempalace mine <dir> to resume" hint after a Ctrl-C interruption both interpolated project_dir without shell-quoting. A path containing spaces or metacharacters produced a copy-paste-broken command. Both spots now use shlex.quote(project_dir). Adds regression tests covering each hint with a path that contains a space.	2026-04-25 01:10:17 -03:00
Igor Lins e Silva	23d534f8f3	fix(init): split --auto-mine from --yes; show file-count estimate before mine prompt Reviewer feedback on the previous commit flagged two real problems: 1. Overloading --yes to also auto-mine was a silent behaviour change for scripted callers. Today --yes only auto-accepts entities — making it ALSO trigger a multi-minute ChromaDB write breaks every script that currently runs `mempalace init --yes <dir>` for the fast non-interactive entity path. Add a separate `--auto-mine` flag instead. Combinations: mempalace init --yes <dir> # entities auto, STILL prompt mine mempalace init --auto-mine <dir> # prompt entities, skip mine prompt mempalace init --yes --auto-mine <dir> # fully non-interactive --yes behaviour is now identical to pre-PR. 2. The mine prompt was firing without telling the user how big the job was. On a real corpus mine takes minutes-to-tens-of-minutes; hitting Enter on default-Y with no size cue is a footgun. Show a one-line estimate computed from scan_project (the same walk we hand into mine) BEFORE the prompt: ~423 files (~12 MB) would be mined into this palace. Mine this directory now? [Y/n] The estimate uses a single corpus walk: scan_project's output is passed into mine() via a new optional files= kwarg, so we never walk the tree twice. Tests: replaced the old "--yes auto-mines" assertion with a regression guard that --yes alone STILL prompts; added coverage for --auto-mine alone, --yes --auto-mine together, and the pre-prompt estimate line.	2026-04-25 01:02:09 -03:00
Igor Lins e Silva	f13b9a46a2	feat(cli): init prompts to mine, mine handles Ctrl-C gracefully `mempalace init` now ends with a `Mine this directory now? [Y/n]` prompt and runs `mine()` in-process when accepted; `--yes` skips the prompt and auto-mines for non-interactive callers. Declining prints the resume command. Removes the "remember to type the next command" friction since rooms + entities just got set up. `mempalace mine` now wraps its main loop in `try / except KeyboardInterrupt` and prints `files_processed`, `drawers_filed`, and `last_file` before exiting with code 130 on Ctrl-C. Re-mining is safe because deterministic drawer IDs make the upsert idempotent. The hooks PID lock at `~/.mempalace/hook_state/mine.pid` is now actively removed in a `finally` when its entry points at us, on clean exit, error, or interrupt — preventing the next hook fire from briefly waiting on a stale PID. Closes #1181, #1182.	2026-04-25 01:01:24 -03:00
Pim Messelink	9e53228ea3	test: update test_cli assertions for mempalace-mcp entry point Three assertions in test_mcp_command_* were still checking for the old `python -m mempalace.mcp_server` output string. Update to match the new `mempalace-mcp` command printed by cmd_mcp(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 01:26:47 -03:00
Igor Lins e Silva	267a644f4f	refactor: route all chromadb access through ChromaBackend Prerequisite for RFC 001 (plugin spec, #743). Removes every direct `import chromadb` outside the ChromaDB backend itself so the core modules depend only on the backend abstraction layer. Extends ChromaBackend with make_client, get_or_create_collection, delete_collection, create_collection, and backend_version. Adds update() to the BaseCollection contract. Non-backend callers (mcp_server, dedup, repair, migrate, cli) now go through the abstraction; tests patch ChromaBackend instead of chromadb. With this landed, the RFC 001 spec can be enforced and PalaceStore (#643) can ship as a plugin without touching core modules.	2026-04-14 00:31:16 -03:00
copilot-swe-agent[bot]	c478dfa173	fix: harden palace security checks Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/775f2fc4-3051-462e-8586-6d694b55da0d Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>	2026-04-12 22:19:58 -03:00
Arnold Wender	89c0a58271	fix: align cmd_compress dict keys with compression_stats() return values (#569 ) * fix: align cmd_compress dict keys with compression_stats() return values * test: align compress test mocks with actual compression_stats() keys * fix: address review — add Total: assertion, move stats key test to test_dialect.py	2026-04-11 16:16:31 -07:00
Kevin Pulikkottil	2981433535	fix: add mcp command with setup guidance (#315 ) * fix: add mcp command with setup guidance * fix: include --palace guidance in mcp command output * fix: make mcp guidance commands copy-pastable --------- Co-authored-by: Milla J <millaj1217@gmail.com>	2026-04-09 11:21:18 -07:00
bensig	b1adc047e6	fix: address Octocode review — move size check, add tests for all 3 fixes - Move file size check before try block so IOError propagates cleanly (not caught by the except OSError handler below it) - Wrap os.path.getsize in its own try/except to preserve existing test_normalize_io_error behavior on missing files - Add test_normalize_rejects_large_file (mocked getsize) - Add test_null_arguments_does_not_hang (#394) - Add test_cmd_repair_trailing_slash_does_not_recurse (#395) 532 tests pass locally, 0 regressions.	2026-04-09 10:40:53 -07:00
Tal Muskal	abd52534bb	test: bring coverage to 85%, set threshold to 85, reset version to 3.0.11 - Add tests for config, convo_miner, spellcheck, knowledge_graph - Fix Windows PermissionError in test cleanup (chromadb file locks) - Add UTF-8 encoding to split_mega_files, entity_registry, hooks_cli - Fix mcp_server parse_known_args logging for unknown args - Set coverage threshold to 85 in pyproject.toml and CI - Reset all version files to 3.0.11 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 21:38:12 +03:00

21 Commits