Raise convo_miner MAX_FILE_SIZE cap 10 MB → 500 MB

Mirrors the miner.py fix in this same branch. convo_miner.py had the
exact same 10 MB cap at line 58 that silently dropped long transcripts
via continue. Long Claude Code sessions, multi-year ChatGPT exports,
and lifetime Slack dumps all exceed 10 MB. Same silent-drop pattern,
different file.

Raised to 500 MB to match miner.py for consistency; downstream chunking
means source file size does not affect storage or embedding cost.

Tests: tests/test_convo_miner_size_cap.py (1 test)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MSL
2026-04-18 08:10:43 -07:00
committed by Igor Lins e Silva
parent d137d12313
commit 6f33d52681
2 changed files with 36 additions and 1 deletions
+5 -1
View File
@@ -55,7 +55,11 @@ CONVO_EXTENSIONS = {
MIN_CHUNK_SIZE = 30
CHUNK_SIZE = 800 # chars per drawer — align with miner.py
MAX_FILE_SIZE = 10 * 1024 * 1024 # 10 MB — skip files larger than this
MAX_FILE_SIZE = 500 * 1024 * 1024 # 500 MB — skip files larger than this.
# Matches miner.py at 500 MB. Long Claude Code sessions, multi-year
# ChatGPT exports, and lifetime Slack dumps routinely exceed 10 MB; the
# cap at that level silently dropped them with `continue`. Source size
# does not affect storage or embedding cost — chunking happens downstream.
def _register_file(collection, source_file: str, wing: str, agent: str):