fix(analytics): define top-session active duration as clamped idle gaps (alt to #867)#869
Conversation
Top Sessions "By Duration" introduced an active-duration metric that summed only the gaps following a tool-use message. That captured tool-execution latency but discarded all model generation and thinking time, including the stretch before the first tool call, so it systematically undercounted how long a session was actually worked on. Redefine active duration as the sum of consecutive inter-message gaps with each gap capped at five minutes, which is exactly the definition the velocity "active minutes" metric already uses. A gap below the cap counts in full (generation, tool execution, or a quick human turnaround); only stretches longer than the cap are bounded as idle. Timestamps alone cannot distinguish a long tool run from a human stepping away, so capping fails gracefully in both directions instead of guessing. Hoist the cap to a single shared constant (ActiveGapCapSec / ActiveGapCapMs) so the two "active" definitions cannot drift, guarded by a test that the seconds and milliseconds forms agree. The active-duration SQL drops the has_tool_use filter and the trailing gap to ended_at to match the velocity computation across SQLite, PostgreSQL, and DuckDB, and the SQLite Go fallback runs the same clamp. Alternative implementation to #83, built on its commits.
roborev: Combined Review (
|
The timezone-aware Go fallback for Top Sessions "By Duration" computed each row's active duration with a per-row follow-up query while the outer rows iterator was still open. That iterator pins one reader connection, and every row then borrowed a second one, an unbounded N+1 over the result set. The reader pool is capped at four connections, so a handful of concurrent duration-fallback requests could each hold their outer cursor while contending for inner connections and deadlock until the busy timeout or context cancellation freed them. Compute active duration in the outer query instead, exactly as the in-SQL and PostgreSQL paths already do, so the whole result set comes back in one statement with no second connection per row. The correlated-subquery SQL is hoisted into a shared sqliteActiveDurationExpr helper used by both SQLite paths, keeping the clamp definition from drifting, and is only emitted for the duration metric so other metrics pay nothing for it.
roborev: Combined Review (
|
Top Sessions "By Duration" ranks by message-gap active duration, which stays positive even when a session's wall-clock timestamps are reversed (ended_at < started_at) or empty. The SQLite SQL and DuckDB paths already exclude such rows, but the PostgreSQL path and the timezone-aware SQLite Go fallback only checked for NULL timestamps. As a result the same malformed session could rank into the top list on those two backends while being filtered out on the others, breaking backend parity. Add the missing guards: ended_at >= started_at for PostgreSQL, and the NULLIF(..., '') emptiness checks plus julianday(ended_at) >= julianday(started_at) for the SQLite fallback, matching the canonical SQLite SQL and DuckDB eligibility. Parity tests cover reversed rows on all affected paths and empty-string rows on SQLite; the fallback empty case pins created_at into the filter window so it reaches the guard instead of being dropped by the date range.
roborev: Combined Review (
|
|
LGTM here |
Alternative to #867. This branch is built on @rodboev's commits (kept in
the history, so the original work and authorship are preserved); it changes
only how "active duration" is computed.
What changed
#867 computes a session's active duration by summing only the gaps that
follow a tool-use message. That captures tool-execution latency but
discards all model generation and thinking time — including the time spent
before the first tool call — so a session that reasons for minutes and then
fires one quick tool is scored as nearly idle.
This branch instead defines active duration the same way the existing
velocity "active minutes" metric already does: the sum of consecutive
inter-message gaps, with each gap capped at 5 minutes. Every gap counts —
model generation, tool execution, and quick human turnarounds — and only
stretches longer than the cap are bounded as idle.
Why
Message timestamps alone can't tell a 20-minute subagent run from a
20-minute coffee break; both are one long gap. Capping each gap fails
gracefully in both directions (a long idle gap and a long active gap each
contribute at most 5 minutes) instead of guessing, and it keeps the metric
robust to messy data: resumed sessions, machine sleep, and API stalls
produce the largest gaps, and those would otherwise dominate the sum.
It also unifies the definition. The 5-minute cap is hoisted to a single
shared constant (
db.ActiveGapCapSec/ActiveGapCapMs) used by both thevelocity metric and Top Sessions, so the two "active" numbers on the
dashboard can't drift; a test asserts the seconds and milliseconds forms
agree.
Tradeoff
A genuinely long active operation — a 15-minute test run, a 20-minute
subagent — is also capped at 5 minutes, so it is undercounted. That is the
deliberate cost of not guessing whether a long gap was work or idle; the
error is bounded and symmetric.
Where to look
internal/db/analytics.go— shared constant, SQLite active-duration SQL,and the velocity metric now sourcing the same cap.
internal/db/timing.go— SQLite Go fallback (timezone-aware ranking path)running the same clamp.
internal/postgres/analytics.go,internal/duckdb/analytics_usage.go—PostgreSQL and DuckDB twins.
All three SQL backends drop the
has_tool_usefilter and the trailing gapto
ended_atso the value matches the velocity computation.generated by a clanker