fix(cli, fact-checker): per-stream stdio errors policy on Windows

Previously all three streams reconfigured to UTF-8 with errors='strict'.
That kills 'mempalace search' the moment a drawer carrying a surrogate
half (round-tripped from a filename via surrogateescape) hits print(),
losing the rest of the result block. Same hazard for warning lines on
stderr.

Split the policy:
  stdin  -> surrogateescape (malformed bytes from a redirected file
            survive as lone surrogates instead of crashing the read)
  stdout -> replace (drawer text with a stray surrogate becomes U+FFFD
            instead of UnicodeEncodeError mid-print)
  stderr -> replace (same protection for logger / warning paths)

Applied identically in the cli.py and fact_checker.py helpers; the DRY
extraction into a shared module is a separate cleanup ask, kept out of
this fix to keep the diff narrow.

Tests updated for the new per-stream assertion.
This commit is contained in:
mvalentsev
2026-05-03 21:37:12 +05:00
parent 32f4dfa26d
commit 03643eb507
4 changed files with 48 additions and 12 deletions
+7 -4
View File
@@ -1076,10 +1076,13 @@ def test_reconfigures_stdio_to_utf8_on_windows():
):
_reconfigure_stdio_utf8_on_windows()
expected = {"encoding": "utf-8", "errors": "strict"}
assert stdin.reconfigure_calls == [expected]
assert stdout.reconfigure_calls == [expected]
assert stderr.reconfigure_calls == [expected]
# Per-stream errors policy: stdin survives bad bytes via
# surrogateescape so a redirected non-UTF-8 file does not crash
# the read; stdout/stderr use replace so a drawer carrying a
# round-tripped surrogate half does not crash mid-print.
assert stdin.reconfigure_calls == [{"encoding": "utf-8", "errors": "surrogateescape"}]
assert stdout.reconfigure_calls == [{"encoding": "utf-8", "errors": "replace"}]
assert stderr.reconfigure_calls == [{"encoding": "utf-8", "errors": "replace"}]
def test_reconfigure_stdio_is_noop_off_windows():