mempalace

Author	SHA1	Message	Date
Igor Lins e Silva	ca0682abe3	benchmarks: apply ruff-format to llm_rerank (trivial line wrap)	2026-04-14 21:20:54 -03:00
Igor Lins e Silva	8df7b9bf2c	benchmarks: add --llm-backend ollama for non-Anthropic rerank The rerank pipeline was hardcoded to Anthropic's /v1/messages. Add a backend flag so the same code path can be exercised with any OpenAI-compatible endpoint — local Ollama, Ollama Cloud, or any gateway that speaks /v1/chat/completions. Enables independent verification of the "100% with Haiku rerank" claim by running the full benchmark with a different LLM family (e.g. minimax-m2.7:cloud) and zero Anthropic dependency. Both longmemeval_bench.py and locomo_bench.py: - llm_rerank*() gain backend= / base_url= kwargs - CLI: --llm-backend {anthropic,ollama}, --llm-base-url - API key required only when backend=anthropic (diary/palace modes still require it) - Parse last integer in response (reasoning models emit multi-int output) - Fallback to message.reasoning when content is empty - Raise max_tokens to 1024 for reasoning models	2026-04-14 21:20:14 -03:00
travisBREAKS	89206107fa	fix(bench): remove hardcoded credential paths from benchmark runners (#177 ) The `_load_api_key()` function in longmemeval_bench.py and locomo_bench.py searched for API keys in a fixed path (`~/.config/lu/keys.json`) using personal key names (`anthropic_milla`, `anthropic_claude_code_main`). This leaks internal infrastructure details into the public codebase and trains contributors to store credentials in a non-standard location rather than using the standard ANTHROPIC_API_KEY env var. Simplified to: CLI flag > env var > empty string. Updated help text and HYBRID_MODE.md docs to match. Co-authored-by: Tadao <tadao@travisfixes.com>	2026-04-11 23:14:36 -07:00
travisBREAKS	d8b2db696f	fix(bench): remove global SSL verification bypass in convomem_bench (#176 ) The module-level `ssl._create_default_https_context = ssl._create_unverified_context` disables certificate verification for ALL urllib requests in the process, not just the benchmark's HuggingFace downloads. This silently exposes the benchmark runner to MITM attacks. If a specific environment needs to skip verification (e.g. corporate proxy), users can set `PYTHONHTTPSVERIFY=0` or pass a custom ssl context per-request rather than globally patching the ssl module. Co-authored-by: Tadao <tadao@travisfixes.com>	2026-04-11 23:14:12 -07:00
bensig	6d8c462219	fix: resolve ruff lint and format errors across codebase Fix E402 import ordering, F841 unused variable, F541 unnecessary f-strings, F401 unused import, and auto-format 6 files.	2026-04-04 18:37:17 -07:00
bensig	0f8fa8c7d5	bench: add benchmark runners, results docs, and test suite Benchmarks: LongMemEval, LoCoMo, ConvoMem, MemBench runners with methodology docs and hybrid retrieval analysis. Tests: config, miner, convo_miner, normalize — 9 tests, all passing.	2026-04-04 18:33:42 -07:00

6 Commits