perf(ingest): cut DB CPU from thing upserts#433
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
murderteeth
left a comment
There was a problem hiding this comment.
-
pr title and body mention the Q5 probe.fetchEventCounts optimization. first commit changes it, second commit reverts it. i agree we dont need to change probe.fetchEventCounts, but i'm confused because the pr info is out of sync with the code and the revert commit has no commit message. confusion is expensive. please clean this up and ensure future prs are more consistent. maybe an automated review stage would help, let me know what you think
-
this pr makes fundamental changes to upsertThings, a critical internal function. code changes look good, but please also run a full index on a test fork and report the results including performance. if its easy to write a good test for upserThings, lets do that too
-
fix this commit message. make it obvious to reviewers why changes are made
2779b8b to
8510d2d
Compare
|
Addressed all three points:
|
Replace SELECT ... FOR UPDATE + read-modify-write in upsertThing with a
single INSERT ... ON CONFLICT DO UPDATE that merges defaults in-DB via
jsonb || (COALESCE(thing.defaults,'{}') || EXCLUDED.defaults). Same
shallow right-wins merge as the old {...current,...new}, but atomic
under the ON CONFLICT row lock instead of an explicit transaction +
row lock — removes the lock-wait contention on hot thing rows that
drove a 138ms mean over 3.87M calls, and closes a latent new-row
clobber race in the old path.
Moved the upsert into db.ts as upsertThingDefaults so it's testable
without going through ThingSchema.parse; db.spec.ts pins the merge
semantics so this can't silently regress.
No change to probe.fetchEventCounts (TTL bump explored, reverted:
not needed, the query is already optimally planned).
8510d2d to
7171763
Compare
Why
Neon compute was ~3,490 compute-hours/period (
cpu_used_sec/3600). Investigationshowed two cost sources: idle computes that never scale to zero (the dominant lever,
~50%+, handled separately in the Neon console), and a handful of expensive ingest
queries. This PR fixes the one query offender still live in current code. Three
others (Q1 sparkline, Q2 strategy-perf, Q3 est-apr) were already fixed by the
chunk-pruning work in #428. Their large
pg_stat_statementstotals are frozenhistory from the old query shapes; no change needed here.
A TTL bump on
probe.fetchEventCounts(1h → 24h) was also explored, but reverted:the query is already optimally planned (parallel index-only scan), so it isn't
included in this PR.
What
upsertThinglock contention (packages/ingest/load/index.ts,packages/ingest/db.ts)Replaced
BEGIN → SELECT defaults ... FOR UPDATE → JS merge → upsert → COMMITwith asingle atomic statement, now in
db.tsasupsertThingDefaults:This removes the row-lock wait, the extra round-trip, and the explicit transaction.
||performs the same shallow right-wins merge as the former{...currentDefaults, ...thing.defaults}, runs atomically under the ON CONFLICT rowlock, and closes a latent clobber race the old path had on concurrent first-inserts.
upsertThingDefaultsnow lives indb.ts, separate fromThingSchema.parse, so it'sdirectly testable.
db.spec.tsis a new test that pins the merge semantics against areal Postgres testcontainer.
Metrics (before / after)
Source:
pg_stat_statementson kong primary, 16.7-day window (2026-06-09 → 06-25).Already fixed by #428 (context, not changed in this PR)
upsertThing (changed here)
Contention is concurrency-dependent and doesn't reproduce on a single connection, so
the real after-number has to come from
pg_stat_statements~24h post-deploy:Expected: mean drops from 138 ms toward low single-digit ms.
Verification
db.spec.ts(new, real Postgres via testcontainers, not mocked)covers a fresh insert (defaults set as-is) and a merge on conflict (new keys added,
overlapping keys right-wins), matching the prior
{...current,...new}semantics.Also hand-verified in a session-local temp table on Neon:
'{"a":1,"yearn":true,"keep":"x"}' || '{"b":2,"yearn":false}'={"a":1,"b":2,"keep":"x","yearn":false}, byte-identical to{...current,...new}.Null-existing case (column is nullable, 0 such rows today):
COALESCE(NULL,'{}') || '{"b":2}'={"b":2}. Without COALESCE that would beNULL, which is why it's there.NULL-defaults divergence, fixed with COALESCE. Param serialization, dropped-
transaction safety, caller impact, and triggers/generated columns on
thingallchecked clean.
upsertThingunder real concurrency, was requested in review. Not done yet; willfollow up with results.
Notes
The bigger compute saving, enabling autosuspend on the 5 always-on kong computes, is
a Neon-console change and isn't part of this diff.