The rerank pipeline was hardcoded to Anthropic's /v1/messages.
Add a backend flag so the same code path can be exercised with
any OpenAI-compatible endpoint — local Ollama, Ollama Cloud,
or any gateway that speaks /v1/chat/completions.
Enables independent verification of the "100% with Haiku rerank"
claim by running the full benchmark with a different LLM family
(e.g. minimax-m2.7:cloud) and zero Anthropic dependency.
Both longmemeval_bench.py and locomo_bench.py:
- llm_rerank*() gain backend= / base_url= kwargs
- CLI: --llm-backend {anthropic,ollama}, --llm-base-url
- API key required only when backend=anthropic (diary/palace modes still require it)
- Parse last integer in response (reasoning models emit multi-int output)
- Fallback to message.reasoning when content is empty
- Raise max_tokens to 1024 for reasoning models
The `_load_api_key()` function in longmemeval_bench.py and locomo_bench.py
searched for API keys in a fixed path (`~/.config/lu/keys.json`) using
personal key names (`anthropic_milla`, `anthropic_claude_code_main`).
This leaks internal infrastructure details into the public codebase and
trains contributors to store credentials in a non-standard location
rather than using the standard ANTHROPIC_API_KEY env var.
Simplified to: CLI flag > env var > empty string. Updated help text
and HYBRID_MODE.md docs to match.
Co-authored-by: Tadao <tadao@travisfixes.com>
The module-level `ssl._create_default_https_context = ssl._create_unverified_context`
disables certificate verification for ALL urllib requests in the process,
not just the benchmark's HuggingFace downloads. This silently exposes
the benchmark runner to MITM attacks.
If a specific environment needs to skip verification (e.g. corporate proxy),
users can set `PYTHONHTTPSVERIFY=0` or pass a custom ssl context per-request
rather than globally patching the ssl module.
Co-authored-by: Tadao <tadao@travisfixes.com>