ae5196bc8d
* refactor: add stage-1 backend abstraction seam Introduce the first upstreamable storage seam for MemPalace without bringing in the PostgreSQL spike or any benchmark artifacts. This change adds a small backend package with: - BaseCollection as the minimal collection contract - ChromaBackend/ChromaCollection as the default implementation It then routes the main runtime collection consumers through that seam: - palace.py - searcher.py - layers.py - palace_graph.py - mcp_server.py - miner.status() Behavioral constraints kept for stage 1: - ChromaDB remains the only backend and the default path - no config/env backend selection yet - no PostgreSQL code - no benchmark or research files - existing tests stay unchanged Important compatibility details: - read paths now call the seam with create=False so they still surface the existing 'no palace found' behavior instead of silently creating empty collections - write paths keep create=True semantics through palace.get_collection() - layers/searcher retain a chromadb module attribute so the existing mock-based tests can keep patching PersistentClient unchanged - ChromaBackend only creates palace directories on create=True, which preserves mocked read-path tests that use fake read-only paths Verification: - python3 -m py_compile mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q # 529 passed, 106 deselected * refactor: clean up stage-1 seam compatibility shims Tighten the stage-1 backend abstraction branch after review. This follow-up does three small things: - keep the chromadb compatibility hook in searcher.py and layers.py, but express it through the backends.chroma module so it no longer reads like an accidental unused import - fix the palace_graph.py helper alias to avoid the local name collision flagged by ruff (imported helper vs local _get_collection wrapper) - preserve the existing mock-based test patch points unchanged while keeping the new backend seam intact Why this matters: - the direct form looked like a dead import in review, even though it was intentionally preserving the existing test seam ( and ) - palace_graph.py had a real lint issue ( redefinition) that was small but worth fixing before a public PR Verification: - /opt/homebrew/bin/ruff check mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * docs: explain backend shim imports in search paths Add short code comments in searcher.py and layers.py explaining why the module-level `chromadb` alias remains after the stage-1 backend seam refactor. The alias is intentional: it preserves the existing mock patch points used by the current test suite (`mempalace.searcher.chromadb.PersistentClient` and `mempalace.layers.chromadb.PersistentClient`) while the runtime logic now flows through the backend abstraction. This keeps the public PR easier to review because the apparent "unused import" now has an explicit reason next to it. Verification: - /opt/homebrew/bin/ruff check mempalace/searcher.py mempalace/layers.py - pytest -q tests/test_layers.py tests/test_searcher.py * refactor: reuse a default backend instance in palace helper Tighten the stage-1 backend seam by promoting the default Chroma backend adapter to a module-level singleton in `mempalace/palace.py`. This keeps the stage-1 scope unchanged — Chroma is still the only backend wired in this branch — but avoids constructing a fresh `ChromaBackend()` object on every `get_collection()` call. The backend is stateless today, so this is a readability/cleanup change rather than a behavioral one. Why this helps: - makes `palace.get_collection()` read like a real default factory instead of an inline constructor call - keeps the stage-1 branch a little cleaner before opening the public PR - does not widen the backend surface or change any config/runtime behavior Verification: - python3 -m py_compile mempalace/palace.py - pytest -q tests/test_miner.py tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * fix: harden read-only seam behavior and update seam tests Preserve the stage-1 backend abstraction while closing the real read-path regression surfaced in PR review. What changed: - make ChromaBackend.get_collection(create=False) fail fast when the palace directory does not exist instead of letting PersistentClient create it as a side effect - update miner.status() to call get_collection(..., create=False) so status keeps the historical 'No palace found' behavior - remove the temporary chromadb shim aliases from layers.py and searcher.py now that the tests patch the seam directly - add focused tests for the new backends package, including ChromaCollection delegation and ChromaBackend create=True/create=False behavior - retarget layer/searcher tests to patch the backend seam instead of patching chromadb.PersistentClient inside production modules - add a regression test that status() does not create an empty palace when the target path is missing Verification: - ruff check . - uv run pytest -q - uv run pytest -q tests/test_backends.py tests/test_cli.py tests/test_mcp_server.py tests/test_layers.py tests/test_searcher.py tests/test_miner.py Notes: - the separate benchmark/slow/stress layer was started as a soak but not used as the merge gate for this PR branch * refactor: drop duplicate mcp collection cache declaration Remove a redundant `_collection_cache = None` assignment in `mempalace/mcp_server.py` left over after the stage-1 backend seam refactor. This does not change behavior; it only trims review noise in the MCP server module after the read-path hardening pass. Verification: - ruff check mempalace/mcp_server.py - uv run pytest -q tests/test_mcp_server.py --------- Co-authored-by: Sergey Kuznetsov <sergey@iterudit.com>
273 lines
9.0 KiB
Python
273 lines
9.0 KiB
Python
import os
|
|
import shutil
|
|
import tempfile
|
|
from pathlib import Path
|
|
|
|
import chromadb
|
|
import yaml
|
|
|
|
from mempalace.miner import mine, scan_project, status
|
|
from mempalace.palace import file_already_mined
|
|
|
|
|
|
def write_file(path: Path, content: str):
|
|
path.parent.mkdir(parents=True, exist_ok=True)
|
|
path.write_text(content, encoding="utf-8")
|
|
|
|
|
|
def scanned_files(project_root: Path, **kwargs):
|
|
files = scan_project(str(project_root), **kwargs)
|
|
return sorted(path.relative_to(project_root).as_posix() for path in files)
|
|
|
|
|
|
def test_project_mining():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
os.makedirs(project_root / "backend")
|
|
|
|
write_file(
|
|
project_root / "backend" / "app.py", "def main():\n print('hello world')\n" * 20
|
|
)
|
|
with open(project_root / "mempalace.yaml", "w") as f:
|
|
yaml.dump(
|
|
{
|
|
"wing": "test_project",
|
|
"rooms": [
|
|
{"name": "backend", "description": "Backend code"},
|
|
{"name": "general", "description": "General"},
|
|
],
|
|
},
|
|
f,
|
|
)
|
|
|
|
palace_path = project_root / "palace"
|
|
mine(str(project_root), str(palace_path))
|
|
|
|
client = chromadb.PersistentClient(path=str(palace_path))
|
|
col = client.get_collection("mempalace_drawers")
|
|
assert col.count() > 0
|
|
finally:
|
|
shutil.rmtree(tmpdir, ignore_errors=True)
|
|
|
|
|
|
def test_scan_project_respects_gitignore():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".gitignore", "ignored.py\ngenerated/\n")
|
|
write_file(project_root / "src" / "app.py", "print('hello')\n" * 20)
|
|
write_file(project_root / "ignored.py", "print('ignore me')\n" * 20)
|
|
write_file(project_root / "generated" / "artifact.py", "print('artifact')\n" * 20)
|
|
|
|
assert scanned_files(project_root) == ["src/app.py"]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_respects_nested_gitignore():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".gitignore", "*.log\n")
|
|
write_file(project_root / "subrepo" / ".gitignore", "tasks/\n")
|
|
write_file(project_root / "subrepo" / "src" / "main.py", "print('main')\n" * 20)
|
|
write_file(project_root / "subrepo" / "tasks" / "task.py", "print('task')\n" * 20)
|
|
write_file(project_root / "subrepo" / "debug.log", "debug\n" * 20)
|
|
|
|
assert scanned_files(project_root) == ["subrepo/src/main.py"]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_allows_nested_gitignore_override():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".gitignore", "*.csv\n")
|
|
write_file(project_root / "subrepo" / ".gitignore", "!keep.csv\n")
|
|
write_file(project_root / "drop.csv", "a,b,c\n" * 20)
|
|
write_file(project_root / "subrepo" / "keep.csv", "a,b,c\n" * 20)
|
|
|
|
assert scanned_files(project_root) == ["subrepo/keep.csv"]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_allows_gitignore_negation_when_parent_dir_is_visible():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".gitignore", "generated/*\n!generated/keep.py\n")
|
|
write_file(project_root / "generated" / "drop.py", "print('drop')\n" * 20)
|
|
write_file(project_root / "generated" / "keep.py", "print('keep')\n" * 20)
|
|
|
|
assert scanned_files(project_root) == ["generated/keep.py"]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_does_not_reinclude_file_from_ignored_directory():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".gitignore", "generated/\n!generated/keep.py\n")
|
|
write_file(project_root / "generated" / "drop.py", "print('drop')\n" * 20)
|
|
write_file(project_root / "generated" / "keep.py", "print('keep')\n" * 20)
|
|
|
|
assert scanned_files(project_root) == []
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_can_disable_gitignore():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".gitignore", "data/\n")
|
|
write_file(project_root / "data" / "stuff.csv", "a,b,c\n" * 20)
|
|
|
|
assert scanned_files(project_root, respect_gitignore=False) == ["data/stuff.csv"]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_can_include_ignored_directory():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".gitignore", "docs/\n")
|
|
write_file(project_root / "docs" / "guide.md", "# Guide\n" * 20)
|
|
|
|
assert scanned_files(project_root, include_ignored=["docs"]) == ["docs/guide.md"]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_can_include_specific_ignored_file():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".gitignore", "generated/\n")
|
|
write_file(project_root / "generated" / "drop.py", "print('drop')\n" * 20)
|
|
write_file(project_root / "generated" / "keep.py", "print('keep')\n" * 20)
|
|
|
|
assert scanned_files(project_root, include_ignored=["generated/keep.py"]) == [
|
|
"generated/keep.py"
|
|
]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_can_include_exact_file_without_known_extension():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".gitignore", "README\n")
|
|
write_file(project_root / "README", "hello\n" * 20)
|
|
|
|
assert scanned_files(project_root, include_ignored=["README"]) == ["README"]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_include_override_beats_skip_dirs():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".pytest_cache" / "cache.py", "print('cache')\n" * 20)
|
|
|
|
assert scanned_files(
|
|
project_root,
|
|
respect_gitignore=False,
|
|
include_ignored=[".pytest_cache"],
|
|
) == [".pytest_cache/cache.py"]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_scan_project_skip_dirs_still_apply_without_override():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
project_root = Path(tmpdir).resolve()
|
|
|
|
write_file(project_root / ".pytest_cache" / "cache.py", "print('cache')\n" * 20)
|
|
write_file(project_root / "main.py", "print('main')\n" * 20)
|
|
|
|
assert scanned_files(project_root, respect_gitignore=False) == ["main.py"]
|
|
finally:
|
|
shutil.rmtree(tmpdir)
|
|
|
|
|
|
def test_file_already_mined_check_mtime():
|
|
tmpdir = tempfile.mkdtemp()
|
|
try:
|
|
palace_path = os.path.join(tmpdir, "palace")
|
|
os.makedirs(palace_path)
|
|
client = chromadb.PersistentClient(path=palace_path)
|
|
col = client.get_or_create_collection("mempalace_drawers")
|
|
|
|
test_file = os.path.join(tmpdir, "test.txt")
|
|
with open(test_file, "w") as f:
|
|
f.write("hello world")
|
|
|
|
mtime = os.path.getmtime(test_file)
|
|
|
|
# Not mined yet
|
|
assert file_already_mined(col, test_file) is False
|
|
assert file_already_mined(col, test_file, check_mtime=True) is False
|
|
|
|
# Add it with mtime
|
|
col.add(
|
|
ids=["d1"],
|
|
documents=["hello world"],
|
|
metadatas=[{"source_file": test_file, "source_mtime": str(mtime)}],
|
|
)
|
|
|
|
# Already mined (no mtime check)
|
|
assert file_already_mined(col, test_file) is True
|
|
# Already mined (mtime matches)
|
|
assert file_already_mined(col, test_file, check_mtime=True) is True
|
|
|
|
# Modify file and force a different mtime (Windows has low mtime resolution)
|
|
with open(test_file, "w") as f:
|
|
f.write("modified content")
|
|
os.utime(test_file, (mtime + 10, mtime + 10))
|
|
|
|
# Still mined without mtime check
|
|
assert file_already_mined(col, test_file) is True
|
|
# Needs re-mining with mtime check
|
|
assert file_already_mined(col, test_file, check_mtime=True) is False
|
|
|
|
# Record with no mtime stored should return False for check_mtime
|
|
col.add(
|
|
ids=["d2"],
|
|
documents=["other"],
|
|
metadatas=[{"source_file": "/fake/no_mtime.txt"}],
|
|
)
|
|
assert file_already_mined(col, "/fake/no_mtime.txt", check_mtime=True) is False
|
|
finally:
|
|
# Release ChromaDB file handles before cleanup (required on Windows)
|
|
del col, client
|
|
shutil.rmtree(tmpdir, ignore_errors=True)
|
|
|
|
|
|
def test_status_missing_palace_does_not_create_empty_collection(tmp_path, capsys):
|
|
palace_path = tmp_path / "missing-palace"
|
|
|
|
status(str(palace_path))
|
|
|
|
out = capsys.readouterr().out
|
|
assert "No palace found" in out
|
|
assert not palace_path.exists()
|