Skip to content

refactor(llm): simplify utils/llm, bounded refine loop, trim tests#296

Merged
spalen0 merged 1 commit into
mainfrom
llm-simplify
Jun 30, 2026
Merged

refactor(llm): simplify utils/llm, bounded refine loop, trim tests#296
spalen0 merged 1 commit into
mainfrom
llm-simplify

Conversation

@spalen0

@spalen0 spalen0 commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Cleanup + quality pass over utils/llm, plus a test trim.

Simplify / reuse

  • Shared error wrapping — the copy-pasted except LLMError: raise / except Exception: raise LLMError(...) ladder (4×) is consolidated into a wrap_llm_errors contextmanager in base.py. Both providers use it; exception types and messages are preserved (incl. the distinct "not valid JSON" message).
  • Regex/import hoistingimport re and the eth_utils selector import moved to module scope; the trailing-risk-tag regex is precompiled and the marker pattern is lru_cached (no per-call imports/recompiles).
  • Fixed a stale _collect_state_reads docstring.

Quality

  • Bounded self-critique loop_refine_summary now loops up to MAX_REFINE_ROUNDS (=3), stopping early on PASS. Cap is about quality (critique converges in 1–2 rounds; over-editing degrades), not cost.
  • refine default-on for explain_transaction / explain_batch_transaction, so every protocol gets the critique pass. The loop runs on the authoritative summary; the detail is still expanded once from the frozen summary, preserving the summary↔detail consistency guarantee.

Tests

  • Dropped the brittle LLM/simulation orchestration tests (call-count/plumbing that mock the external API away) and kept the deterministic prompt-building + reply-parsing tests. Net −376 lines.
  • Trade-off: the refine-loop and two-stage-consistency behaviors are no longer covered (they're provider-mocked) — deliberate, given we lean on the external API.

Verification

  • ruff check + ruff format --check clean.
  • pytest tests/test_ai_explainer.py tests/test_llm.py → 85 passed.

Not included (follow-up)

  • Best-of-N + judge for a further output-quality bump — discussed, not built. Can add on request.

🤖 Generated with Claude Code

Cleanup pass over utils/llm plus an LLM-output-quality change and a
test trim.

Simplify / reuse:
- Consolidate the copy-pasted "re-raise LLMError, wrap everything else"
  ladder into a shared `wrap_llm_errors` contextmanager in base.py; both
  providers use it (messages/exception types preserved).
- Hoist `import re` and the eth_utils selector import to module scope and
  precompile the trailing-risk-tag regex / cache the marker pattern in
  ai_explainer.py (no more per-call imports or recompiles).
- Fix a stale `_collect_state_reads` docstring.

Quality:
- Generalize the single self-critique pass into a bounded loop
  (`_refine_summary`, capped by MAX_REFINE_ROUNDS=3) that stops on PASS.
- Default `refine=True` on explain_transaction / explain_batch_transaction
  so every protocol gets the critique pass. Loop runs on the authoritative
  summary; detail is still expanded once from the frozen summary.

Tests:
- Drop the brittle LLM/simulation orchestration tests that mock the API
  away (call-count/plumbing); keep the deterministic prompt-building and
  reply-parsing tests. Net -376 lines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@spalen0 spalen0 marked this pull request as ready for review June 30, 2026 10:42
@spalen0 spalen0 merged commit 91fd6fb into main Jun 30, 2026
3 checks passed
@spalen0 spalen0 deleted the llm-simplify branch June 30, 2026 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant