Skip to content

feat(macrobenchmarks): collect and report system metrics#952

Open
zhixiangli wants to merge 2 commits into
fsspec:mainfrom
zhixiangli:feat-macrobench-system-metrics
Open

feat(macrobenchmarks): collect and report system metrics#952
zhixiangli wants to merge 2 commits into
fsspec:mainfrom
zhixiangli:feat-macrobench-system-metrics

Conversation

@zhixiangli

Copy link
Copy Markdown
Collaborator

Fetch per-pod system metrics (CPU, memory, network) from Cloud Monitoring during macrobenchmarks. Reduce these metrics to the bottleneck pod (maximum peak and mean across pods) and include them in the final summary.

  • Add metrics.monitoring to fetch metrics using Cloud Monitoring API.
  • Update metrics.calculate to perform the reduction.
  • Update schema and raw store to handle the new metrics.
  • Update scrape_metrics.sh to run the collection.
  • Add tests for the new functionality.

Fetch per-pod system metrics (CPU, memory, network) from Cloud
Monitoring during macrobenchmarks. Reduce these metrics to the
bottleneck pod (maximum peak and mean across pods) and include
them in the final summary.

- Add `metrics.monitoring` to fetch metrics using Cloud Monitoring API.
- Update `metrics.calculate` to perform the reduction.
- Update schema and raw store to handle the new metrics.
- Update `scrape_metrics.sh` to run the collection.
- Add tests for the new functionality.

TAG=agy
CONV=c564d507-372d-476b-9cb2-a9aab7884941

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces system metrics collection (CPU, memory, and network) from Cloud Monitoring for macrobenchmarks. It adds a new monitoring module to fetch these metrics, updates the database schema and calculations to aggregate per-pod metrics to the bottleneck pod, and updates the scrape script to collect them on a best-effort basis. The review feedback suggests improving the robustness of RFC3339 datetime parsing in the monitoring module by handling lowercase timezone designators.

Comment thread cloudbuild/macrobenchmarks/metrics/monitoring.py
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.68%. Comparing base (4ce07fe) to head (4c3298d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #952   +/-   ##
=======================================
  Coverage   89.68%   89.68%           
=======================================
  Files          16       16           
  Lines        3579     3579           
=======================================
  Hits         3210     3210           
  Misses        369      369           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@zhixiangli zhixiangli marked this pull request as ready for review July 3, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant