mempalace

jason/mempalace

Fork 0

Commit Graph

Author	SHA1	Message	Date
Igor Lins e Silva	8df7b9bf2c	benchmarks: add --llm-backend ollama for non-Anthropic rerank The rerank pipeline was hardcoded to Anthropic's /v1/messages. Add a backend flag so the same code path can be exercised with any OpenAI-compatible endpoint — local Ollama, Ollama Cloud, or any gateway that speaks /v1/chat/completions. Enables independent verification of the "100% with Haiku rerank" claim by running the full benchmark with a different LLM family (e.g. minimax-m2.7:cloud) and zero Anthropic dependency. Both longmemeval_bench.py and locomo_bench.py: - llm_rerank*() gain backend= / base_url= kwargs - CLI: --llm-backend {anthropic,ollama}, --llm-base-url - API key required only when backend=anthropic (diary/palace modes still require it) - Parse last integer in response (reasoning models emit multi-int output) - Fallback to message.reasoning when content is empty - Raise max_tokens to 1024 for reasoning models	2026-04-14 21:20:14 -03:00
travisBREAKS	89206107fa	fix(bench): remove hardcoded credential paths from benchmark runners (#177 ) The `_load_api_key()` function in longmemeval_bench.py and locomo_bench.py searched for API keys in a fixed path (`~/.config/lu/keys.json`) using personal key names (`anthropic_milla`, `anthropic_claude_code_main`). This leaks internal infrastructure details into the public codebase and trains contributors to store credentials in a non-standard location rather than using the standard ANTHROPIC_API_KEY env var. Simplified to: CLI flag > env var > empty string. Updated help text and HYBRID_MODE.md docs to match. Co-authored-by: Tadao <tadao@travisfixes.com>	2026-04-11 23:14:36 -07:00
bensig	0f8fa8c7d5	bench: add benchmark runners, results docs, and test suite Benchmarks: LongMemEval, LoCoMo, ConvoMem, MemBench runners with methodology docs and hybrid retrieval analysis. Tests: config, miner, convo_miner, normalize — 9 tests, all passing.	2026-04-04 18:33:42 -07:00

Author

SHA1

Message

Date

Igor Lins e Silva

8df7b9bf2c

benchmarks: add --llm-backend ollama for non-Anthropic rerank

The rerank pipeline was hardcoded to Anthropic's /v1/messages.
Add a backend flag so the same code path can be exercised with
any OpenAI-compatible endpoint — local Ollama, Ollama Cloud,
or any gateway that speaks /v1/chat/completions.

Enables independent verification of the "100% with Haiku rerank"
claim by running the full benchmark with a different LLM family
(e.g. minimax-m2.7:cloud) and zero Anthropic dependency.

Both longmemeval_bench.py and locomo_bench.py:
 - llm_rerank*() gain backend= / base_url= kwargs
 - CLI: --llm-backend {anthropic,ollama}, --llm-base-url
 - API key required only when backend=anthropic (diary/palace modes still require it)
 - Parse last integer in response (reasoning models emit multi-int output)
 - Fallback to message.reasoning when content is empty
 - Raise max_tokens to 1024 for reasoning models

2026-04-14 21:20:14 -03:00

travisBREAKS

89206107fa

fix(bench): remove hardcoded credential paths from benchmark runners (#177 )

The `_load_api_key()` function in longmemeval_bench.py and locomo_bench.py
searched for API keys in a fixed path (`~/.config/lu/keys.json`) using
personal key names (`anthropic_milla`, `anthropic_claude_code_main`).

This leaks internal infrastructure details into the public codebase and
trains contributors to store credentials in a non-standard location
rather than using the standard ANTHROPIC_API_KEY env var.

Simplified to: CLI flag > env var > empty string. Updated help text
and HYBRID_MODE.md docs to match.

Co-authored-by: Tadao <tadao@travisfixes.com>

2026-04-11 23:14:36 -07:00

bensig

0f8fa8c7d5

bench: add benchmark runners, results docs, and test suite

Benchmarks: LongMemEval, LoCoMo, ConvoMem, MemBench runners with
methodology docs and hybrid retrieval analysis.

Tests: config, miner, convo_miner, normalize — 9 tests, all passing.

2026-04-04 18:33:42 -07:00

3 Commits