fix(search): BM25 hybrid rerank, legacy-metric warning, invariant tests
Three tightly-coupled search-quality fixes for v3.3.3: 1. CLI `mempalace search` now routes through the same `_hybrid_rank` the MCP path already used. Drawers whose text contains every query term but embed as file-tree noise (directory listings, diffs, log fragments) were scoring cosine distance >= 1.0 — the display formula `max(0, 1 - dist)` then floored every result to `Match: 0.0`, with no way for the user to tell a lexical match from a total miss. BM25 catches these cleanly; the display surfaces both `cosine=` and `bm25=` so users see which component is firing. 2. Legacy-palace distance-metric warning. Palaces created before `hnsw:space=cosine` was consistently set silently use ChromaDB's default L2 metric, which breaks the cosine-similarity formula (L2 distances routinely exceed 1.0 on normalized 384-dim vectors). The search path now detects this at query time and prints a one-line notice pointing at `mempalace repair`. Only fires for legacy palaces; new palaces already set cosine correctly. 3. Invariant tests pinning `hnsw:space=cosine` on every collection- creation path — legacy `get_or_create_collection`, legacy `create_collection`, RFC 001 `get_collection(create=True)`, the public `palace.get_collection`, and a round-trip through reopen. Locks down the correctness that new-user palaces already have so a future refactor can't silently regress it. Also adds a `metadata` property to `ChromaCollection` so callers can read the underlying hnsw:space without reaching into `_collection`. Tests: - New regression: simulate three candidates at distance 1.5 (cosine=0), one containing query terms — must rank first with non-zero bm25. - New: legacy metric (empty or non-cosine) produces stderr warning. - New: correctly-configured palace produces no warning. - New: all five creation paths pin cosine metadata. All existing tests still pass.
This commit is contained in:
@@ -368,6 +368,18 @@ class ChromaCollection(BaseCollection):
|
||||
def count(self):
|
||||
return self._collection.count()
|
||||
|
||||
@property
|
||||
def metadata(self) -> dict:
|
||||
"""Pass-through to the underlying ChromaDB collection's metadata.
|
||||
|
||||
Used by the searcher to detect legacy palaces that were created
|
||||
without ``hnsw:space=cosine`` and therefore silently use L2
|
||||
distance, which breaks cosine-based similarity interpretation.
|
||||
Returns ``{}`` when metadata is absent so callers can do a plain
|
||||
``.get("hnsw:space")`` without None-checks.
|
||||
"""
|
||||
return self._collection.metadata or {}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Backend
|
||||
|
||||
Reference in New Issue
Block a user