mempalace

Author	SHA1	Message	Date
jason	9f18e556d5	docs and usage	2026-05-09 10:59:58 -05:00
jason	40e5e5e3cc	cleanup and remote only	2026-05-09 10:52:25 -05:00
bensig	41d45d9336	docs: RFC 002 — source adapter plugin specification Draft plugin specification for source adapters, mirroring RFC 001's role for storage backends. Formalizes the contract six community ingester PRs (#274, #23, #169, #232, #567, #98, #702) plus #981's metadata-only mode have been reinventing ad-hoc, so adapter authors can build to a stable surface. Key decisions: - Single ingest() method; lazy adapters yield SourceItemMetadata ahead of drawers, eager adapters interleave - Declared-transformation model (§1.4) replaces informal verbatim promise with a verifiable one; byte_preserving adapters declare the empty set, declared_lossy adapters enumerate. Existing miner.py and the convo_miner+normalize pipeline map cleanly - Palace is the incremental cursor via is_current(item, metadata); no sidecar persistence - Routing is adapter-owned; detect_room/detect_hall move into the filesystem adapter - Flat metadata per ChromaDB (RFC 001 §1.4) — entity hints as json_string field, KG triples route to SQLite knowledge graph - Closets stay core-built as a post-step; adapters may emit flat closet_hints. Closes existing gap where convo drawers get no closets - No per-drawer field renames: source_file, filed_at, source_mtime, added_by, normalize_version, entities, ingest_mode all preserved. Spec adds adapter_name, adapter_version, privacy_class §9 enumerates the cleanup PR prerequisites (mempalace/sources/ module, PalaceContext facade, KnowledgeGraph.add_triple gaining backwards-compatible source_drawer_id + adapter_name params). Tracking issue: #989	2026-04-17 23:42:46 -07:00
Igor Lins e Silva	65bf1ebda3	docs: slim README and move corrections/notices to docs/HISTORY.md Addresses #875. The previous README was 755 lines mixing six purposes (scam alert, hero, two mea-culpa notes, install guide, architecture explainer, API reference, file map). Rework it as a pure entry point: what MemPalace is, how to install, honest benchmark numbers, links to the website for concept/architecture documentation. Key content changes: - Drop the "highest-scoring AI memory system ever benchmarked" framing. - New tagline: "Local-first AI memory. Verbatim storage, pluggable backend, 96.6% R@5 raw on LongMemEval — zero API calls." Avoids naming a specific vector-store implementation since the backend is pluggable (see mempalace/backends/base.py). - Remove the cross-system comparison table. Retrieval recall (R@5) and end-to-end QA accuracy are different metrics and are not comparable; placing MemPalace's R@5 next to competitor QA accuracy under a single column header was a category error. - The "100%" LongMemEval headline is no longer the lead. The honest held-out figure is 98.4% R@5 on 450 unseen questions. The rerank pipeline reaches >=99% with any capable LLM (reproduced with Claude Haiku, Sonnet, and minimax-m2.7 via Ollama) — pipeline-level, not model-specific. - Benchmark reproduction commands now reference the correct repo (MemPalace/mempalace, not the defunct aya-thekeeper/mempal branch). New file: docs/HISTORY.md as the canonical home for post-launch corrections, public notices, and retractions. Contains verbatim: - 2026-04-14 note on this rewrite (links to #875) - 2026-04-11 impostor-domain notice (moved from README header) - 2026-04-07 "A Note from Milla & Ben" (moved from README body) README keeps a one-line scam-alert callout that links to docs/HISTORY.md for the full timeline.	2026-04-14 21:37:20 -03:00
Igor Lins e Silva	21d4a23430	merge: develop + harden closet layer for production Merges develop (#820 version sync, #785 strip_noise + NORMALIZE_VERSION, #784 file locking) and addresses six concerns surfaced during PR review of the closet feature: 1. Closet append-on-rebuild bug — upsert_closet_lines used to APPEND to existing closets (mismatched the doc's "fully replaced" promise). With NORMALIZE_VERSION rebuilds on develop, this would have stacked stale v1 topics on top of fresh v2 content forever. Fix: - Drop the read-and-append branch from upsert_closet_lines (now a pure numbered-id overwrite). - Add purge_file_closets(closets_col, source_file) helper that wipes every closet for a source file by where-filter. - process_file calls purge_file_closets before upsert on every mine, mirroring the existing drawer purge. 2. Searcher returned whole-file blobs from the closet path while the direct path returned chunk-level drawers. Refactored: - _extract_drawer_ids_from_closet parses the `→drawer_a,drawer_b` pointers out of closet documents. - _closet_first_hits hydrates exactly those drawer IDs (chunk-level), not collection.get(where=source_file) (which returned everything). - Same hit shape as direct-search path; both now carry matched_via. 3. max_distance was bypassed on the closet path. Now applied per-hit; when every closet candidate gets filtered, _closet_first_hits returns None and the caller falls through to direct drawer search. 4. Entity extraction caught sentence-starters like "When", "The", "After" as proper nouns. Added _ENTITY_STOPLIST (~40 common false positives + day/month names + role words). Real names like Igor / Milla still survive — covered by tests. 5. CLOSETS.md drifted from the code (claimed "replaced via upsert" but code appended; claimed BM25 hybrid that doesn't exist; claimed a 10K char hydration cap that wasn't enforced). Rewritten to describe what actually ships, with explicit notes on the BM25 / convo-closet follow-ups. 6. Zero tests for ~250 lines. Added tests/test_closets.py with 17 cases: - build_closet_lines: pointer shape, header extraction, stoplist filtering (with regression case for "When/After/The"), real-name survival, fallback-line guarantee, drawer-ref slicing. - upsert_closet_lines: pure overwrite semantics (regression for the append bug), char-limit packing without splitting lines. - purge_file_closets: scoped to source_file, doesn't touch others. - End-to-end miner rebuild: re-mining a file with fewer topics fully purges leftover numbered closets from the larger first run. - _extract_drawer_ids_from_closet: parsing + dedup edge cases. - search_memories closet-first: fallback when empty, chunk-level hits with matched_via, no whole-file glue, max_distance enforced. Merge resolutions: miner.py imports combined NORMALIZE_VERSION/mine_lock from develop with the closet helpers from this branch. process_file auto-merged cleanly (closet block sits inside develop's lock body). 724/724 tests pass. ruff + format clean under CI-pinned 0.4.x.	2026-04-13 17:00:55 -03:00
Igor Lins e Silva	ee60cad652	docs: add CLOSETS.md — closet layer overview Cherry-picked the docs portion of 67e4ac6 to accompany the closet feature. Test coverage for closets is omnibus with tests for entity metadata and BM25 (see PR targeting those features) and will land together in a follow-up. Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>	2026-04-13 07:38:43 -03:00
bensig	06963ddaed	chore: improve agent readiness — AGENTS.md, dependabot, CODEOWNERS, labels - Add AGENTS.md with build commands, project structure, conventions - Add .github/dependabot.yml for automated pip + actions updates - Add .github/CODEOWNERS for review routing - Expand .gitignore (.env, .DS_Store, IDE configs, coverage, venvs) - Add C901 complexity rule to ruff (max-complexity=25, benchmarks excluded) - Add --durations=10 to pytest CI for test performance tracking - Add docs/schema.sql for knowledge graph schema documentation - Created P0-P3 priority + area/* + security/performance/docs labels	2026-04-09 23:29:26 -07:00

7 Commits