refactor(llm): simplify utils/llm, bounded refine loop, trim tests#296
Merged
Conversation
Cleanup pass over utils/llm plus an LLM-output-quality change and a test trim. Simplify / reuse: - Consolidate the copy-pasted "re-raise LLMError, wrap everything else" ladder into a shared `wrap_llm_errors` contextmanager in base.py; both providers use it (messages/exception types preserved). - Hoist `import re` and the eth_utils selector import to module scope and precompile the trailing-risk-tag regex / cache the marker pattern in ai_explainer.py (no more per-call imports or recompiles). - Fix a stale `_collect_state_reads` docstring. Quality: - Generalize the single self-critique pass into a bounded loop (`_refine_summary`, capped by MAX_REFINE_ROUNDS=3) that stops on PASS. - Default `refine=True` on explain_transaction / explain_batch_transaction so every protocol gets the critique pass. Loop runs on the authoritative summary; detail is still expanded once from the frozen summary. Tests: - Drop the brittle LLM/simulation orchestration tests that mock the API away (call-count/plumbing); keep the deterministic prompt-building and reply-parsing tests. Net -376 lines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cleanup + quality pass over
utils/llm, plus a test trim.Simplify / reuse
except LLMError: raise / except Exception: raise LLMError(...)ladder (4×) is consolidated into awrap_llm_errorscontextmanager inbase.py. Both providers use it; exception types and messages are preserved (incl. the distinct "not valid JSON" message).import reand theeth_utilsselector import moved to module scope; the trailing-risk-tag regex is precompiled and the marker pattern islru_cached (no per-call imports/recompiles)._collect_state_readsdocstring.Quality
_refine_summarynow loops up toMAX_REFINE_ROUNDS(=3), stopping early onPASS. Cap is about quality (critique converges in 1–2 rounds; over-editing degrades), not cost.refinedefault-on forexplain_transaction/explain_batch_transaction, so every protocol gets the critique pass. The loop runs on the authoritative summary; the detail is still expanded once from the frozen summary, preserving the summary↔detail consistency guarantee.Tests
Verification
ruff check+ruff format --checkclean.pytest tests/test_ai_explainer.py tests/test_llm.py→ 85 passed.Not included (follow-up)
🤖 Generated with Claude Code