feat(macrobenchmarks): collect and report system metrics#952
feat(macrobenchmarks): collect and report system metrics#952zhixiangli wants to merge 2 commits into
Conversation
Fetch per-pod system metrics (CPU, memory, network) from Cloud Monitoring during macrobenchmarks. Reduce these metrics to the bottleneck pod (maximum peak and mean across pods) and include them in the final summary. - Add `metrics.monitoring` to fetch metrics using Cloud Monitoring API. - Update `metrics.calculate` to perform the reduction. - Update schema and raw store to handle the new metrics. - Update `scrape_metrics.sh` to run the collection. - Add tests for the new functionality. TAG=agy CONV=c564d507-372d-476b-9cb2-a9aab7884941
There was a problem hiding this comment.
Code Review
This pull request introduces system metrics collection (CPU, memory, and network) from Cloud Monitoring for macrobenchmarks. It adds a new monitoring module to fetch these metrics, updates the database schema and calculations to aggregate per-pod metrics to the bottleneck pod, and updates the scrape script to collect them on a best-effort basis. The review feedback suggests improving the robustness of RFC3339 datetime parsing in the monitoring module by handling lowercase timezone designators.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #952 +/- ##
=======================================
Coverage 89.68% 89.68%
=======================================
Files 16 16
Lines 3579 3579
=======================================
Hits 3210 3210
Misses 369 369 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Fetch per-pod system metrics (CPU, memory, network) from Cloud Monitoring during macrobenchmarks. Reduce these metrics to the bottleneck pod (maximum peak and mean across pods) and include them in the final summary.
metrics.monitoringto fetch metrics using Cloud Monitoring API.metrics.calculateto perform the reduction.scrape_metrics.shto run the collection.