refactor(llm): simplify utils/llm, bounded refine loop, trim tests by spalen0 · Pull Request #296 · yearn/monitoring

spalen0 · 2026-06-27T09:58:42Z

Cleanup + quality pass over utils/llm, plus a test trim.

Simplify / reuse

Shared error wrapping — the copy-pasted except LLMError: raise / except Exception: raise LLMError(...) ladder (4×) is consolidated into a wrap_llm_errors contextmanager in base.py. Both providers use it; exception types and messages are preserved (incl. the distinct "not valid JSON" message).
Regex/import hoisting — import re and the eth_utils selector import moved to module scope; the trailing-risk-tag regex is precompiled and the marker pattern is lru_cached (no per-call imports/recompiles).
Fixed a stale _collect_state_reads docstring.

Quality

Bounded self-critique loop — _refine_summary now loops up to MAX_REFINE_ROUNDS (=3), stopping early on PASS. Cap is about quality (critique converges in 1–2 rounds; over-editing degrades), not cost.
refine default-on for explain_transaction / explain_batch_transaction, so every protocol gets the critique pass. The loop runs on the authoritative summary; the detail is still expanded once from the frozen summary, preserving the summary↔detail consistency guarantee.

Tests

Dropped the brittle LLM/simulation orchestration tests (call-count/plumbing that mock the external API away) and kept the deterministic prompt-building + reply-parsing tests. Net −376 lines.
Trade-off: the refine-loop and two-stage-consistency behaviors are no longer covered (they're provider-mocked) — deliberate, given we lean on the external API.

Verification

ruff check + ruff format --check clean.
pytest tests/test_ai_explainer.py tests/test_llm.py → 85 passed.

Not included (follow-up)

Best-of-N + judge for a further output-quality bump — discussed, not built. Can add on request.

🤖 Generated with Claude Code

Cleanup pass over utils/llm plus an LLM-output-quality change and a test trim. Simplify / reuse: - Consolidate the copy-pasted "re-raise LLMError, wrap everything else" ladder into a shared `wrap_llm_errors` contextmanager in base.py; both providers use it (messages/exception types preserved). - Hoist `import re` and the eth_utils selector import to module scope and precompile the trailing-risk-tag regex / cache the marker pattern in ai_explainer.py (no more per-call imports or recompiles). - Fix a stale `_collect_state_reads` docstring. Quality: - Generalize the single self-critique pass into a bounded loop (`_refine_summary`, capped by MAX_REFINE_ROUNDS=3) that stops on PASS. - Default `refine=True` on explain_transaction / explain_batch_transaction so every protocol gets the critique pass. Loop runs on the authoritative summary; detail is still expanded once from the frozen summary. Tests: - Drop the brittle LLM/simulation orchestration tests that mock the API away (call-count/plumbing); keep the deterministic prompt-building and reply-parsing tests. Net -376 lines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

spalen0 marked this pull request as ready for review June 30, 2026 10:42

spalen0 merged commit 91fd6fb into main Jun 30, 2026
3 checks passed

spalen0 deleted the llm-simplify branch June 30, 2026 10:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(llm): simplify utils/llm, bounded refine loop, trim tests#296

refactor(llm): simplify utils/llm, bounded refine loop, trim tests#296
spalen0 merged 1 commit into
mainfrom
llm-simplify

spalen0 commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

spalen0 commented Jun 27, 2026

Simplify / reuse

Quality

Tests

Verification

Not included (follow-up)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant