feat(experiments): Filter experiments list by a rollup metric by shanaiabuggy · Pull Request #453 · NVIDIA-NeMo/nemo-platform

shanaiabuggy · 2026-06-25T00:17:15Z

What

Adds metric filtering to the experiments list — filter by rollup metrics (cost, latency, evaluator scores, run count), not just entity fields.

How

Reuses the platform's standard filter[field][op] bracket syntax, so it combines naturally with existing entity filters:

?filter[cost_usd.mean][$lte]=0.5&filter[run_count][$gte]=1&filter[experiment_group_id]=

Supported paths mirror the sort grammar: run_count, cost_usd.<stat>, latency_ms.<stat>, evaluators.<name>.<stat> (stat ∈ mean/median/p90/p95/p99/sum/count). Operators: $gte/$lte/$gt/$lt/$eq.
Metrics live in ClickHouse, not Postgres, so list_experiments splits the filter tree: entity predicates go to the entity store; metric predicates are applied in-app after rollup hydration (compute-on-read, same plumbing as metric sort). Declared via self-mapping namespaces on ExperimentFilter so paths pass validation untranslated.
Added a NumberFilter range type ($gte/$lte/$gt/$lt/$eq) alongside DatetimeFilter/StringFilter.

Behavior

Metric filters must be AND-combined with entity filters (nested ANDs flatten); a metric under OR/NOT → 400 (can't split a boolean tree across two stores).
400 unsupported metric/stat/operator; 413 result set over the in-memory bound; 503 when ClickHouse is unavailable for a metric filter. Missing metric never matches.

Tests

Unit tests for the split/validate/match helpers + endpoint wiring (validation, 400/503), and an integration test combining entity + metric filters end to end against ClickHouse. OpenAPI specs regenerated.

Summary by CodeRabbit

New Features
- Enhanced experiments listing to support numeric comparisons on metric rollups, including run_count, cost_usd, latency_ms, and evaluator metrics.
- Added rollup-stat filtering for supported stats (e.g., mean, median, p90, p95, p99, sum, count) with operators like $eq, $gt, $gte, $lt, $lte.
Bug Fixes
- Unsupported sort or filter fields now return clearer 400 errors.
- Metric-based sorting/filtering now returns 503 when telemetry data for rollups is unavailable.
Documentation
- Updated OpenAPI specs and examples for metric rollup filtering.
Tests
- Added integration and unit test coverage for valid and invalid metric filters.

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

coderabbitai · 2026-06-25T00:22:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 25a7b1d9-aef8-4ed7-849a-b67b7481894d

📥 Commits

Reviewing files that changed from the base of the PR and between 139fc9e and 0b8d5ff.

⛔ Files ignored due to path filters (1)

sdk/stainless.yaml is excluded by !sdk/**

📒 Files selected for processing (5)

openapi/ga/individual/platform.openapi.yaml
openapi/ga/openapi.yaml
openapi/openapi.yaml
services/intake/src/nmp/intake/api/v2/experiments/schemas.py
services/intake/tests/test_experiment_metric_filter.py

💤 Files with no reviewable changes (2)

services/intake/tests/test_experiment_metric_filter.py
services/intake/src/nmp/intake/api/v2/experiments/schemas.py

🚧 Files skipped from review as they are similar to previous changes (1)

openapi/ga/openapi.yaml

📝 Walkthrough

Walkthrough

Adds numeric rollup filtering to the experiments list API. OpenAPI, shared filter types, endpoint handling, and tests now cover run_count, cost_usd, latency_ms, and evaluators comparisons.

Changes

Metric Rollup Filters

Layer / File(s)	Summary
Endpoint docs and error text `openapi/openapi.yaml`, `openapi/ga/openapi.yaml`, `openapi/ga/individual/platform.openapi.yaml`, `services/intake/src/nmp/intake/api/v2/experiments/endpoints.py`	`filter` docs add rollup-metric examples; `400` and `503` text now mention unsupported sort/filter fields and metric-based sort/filter.
Filter contract and schema shapes `openapi/openapi.yaml`, `openapi/ga/openapi.yaml`, `openapi/ga/individual/platform.openapi.yaml`, `packages/nmp_common/src/nmp/common/entities/values.py`, `services/intake/src/nmp/intake/api/v2/experiments/schemas.py`	`ExperimentFilter` gains rollup-metric fields, and `NumberFilter` adds `$gte`, `$lte`, `$gt`, `$lt`, and `$eq` operators.
Metric filter handling in list_experiments `services/intake/src/nmp/intake/api/v2/experiments/endpoints.py`	`list_experiments` splits entity filters from metric predicates, validates metric paths and numeric operators, hydrates rollups, and applies metric predicates before sorting and pagination.
Metric filter tests `services/intake/tests/test_experiment_metric_filter.py`, `services/intake/tests/integration/spans/test_experiment_metric_sort.py`	Unit tests cover filter extraction and validation, and integration tests cover combined metric filtering in the experiments list response.

Possibly related PRs

NVIDIA-NeMo/nemo-platform#124 — Extends the same experiments filter schema with rollup metric comparisons.
NVIDIA-NeMo/nemo-platform#448 — Updates the same experiments list endpoint’s rollup hydration and metric-path handling.
NVIDIA-NeMo/nemo-platform#154 — Touches the same intake rollup fields used by this endpoint.

Suggested reviewers

BrianNewsom

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding rollup-metric filtering to the experiments list.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch sbuggy/ase-321

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

packages/nmp_common/src/nmp/common/entities/values.py (1)
274-310: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

NumberFilter overlaps FloatFilter.

FloatFilter already provides $gte/$lte; NumberFilter adds $gt/$lt/$eq. Consider folding the extra operators into FloatFilter (or deriving one from the other) to avoid two near-identical numeric filters drifting apart.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nmp_common/src/nmp/common/entities/values.py` around lines 274 -
310, `NumberFilter` duplicates most of `FloatFilter` and risks the two numeric
filter models drifting apart. Refactor the filter types so there is a single
source of truth for numeric comparisons, either by moving `$gt`/`$lt`/`$eq` into
`FloatFilter` or by making `NumberFilter` inherit/compose from `FloatFilter`;
update the `NumberFilter` and `FloatFilter` definitions in `values.py` so their
shared behavior lives in one place and their aliases/config stay consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openapi/ga/individual/platform.openapi.yaml`:
- Around line 3738-3740: The filter examples in the OpenAPI docs use unprefixed
operators that do not match the schema. Update the example text in the
`NumberFilter`/rollup metric descriptions so the keys match the defined query
shape (`$gte`, `$lte`, `$gt`, `$lt`, `$eq`) everywhere this example appears,
including the related `NumberFilter` documentation block. Keep the surrounding
example values the same, but ensure the operator names in the docs are
consistent with the schema.
- Around line 14255-14279: The NumberFilter schema currently allows empty
objects because it lacks a minimum property constraint. Update the NumberFilter
definition in the openapi schema to require at least one predicate by adding
minProperties: 1 alongside the existing properties and additionalProperties:
false, so the schema still accepts $gte, $lte, $gt, $lt, or $eq but rejects {}.

In `@openapi/ga/openapi.yaml`:
- Around line 3734-3740: Update the filter documentation text in the OpenAPI
spec so the numeric range examples use the same $-prefixed operator keys defined
by the schema. In the affected description near the experiments filter section,
change the examples for run_count, cost_usd.mean, latency_ms.p95, and
evaluators.<name>.mean to use $gte/$lte/$gt/$lt/$eq consistently. Apply the same
wording cleanup anywhere the duplicated filter description appears so the
examples match the actual supported operators and do not point clients to
invalid keys.
- Around line 14255-14279: NumberFilter currently allows an empty object, so
update the NumberFilter schema in openapi/ga/openapi.yaml to require at least
one predicate operator. Add minProperties: 1 alongside the existing properties
definition so validation rejects {} while still allowing $gte, $lte, $gt, $lt,
or $eq. Use the NumberFilter schema block to locate the change.

In `@services/intake/src/nmp/intake/api/v2/experiments/endpoints.py`:
- Around line 992-1016: The metric filter validation in the LogicalOperation
handling is rejecting nested AND groups because
`_operation_references_metric(child)` treats a child AND containing metrics as
invalid, even though the parent combinator is already AND. Update the logic
around `LogicalOperation`, `_operation_references_metric`, and
`_validated_metric_predicate` to either flatten nested ANDs before validation or
explicitly recurse through AND children so metric comparisons inside sub-ANDs
are accepted; if nested ANDs remain unsupported, adjust the HTTPException detail
to clearly state that only flat metric comparisons are allowed.

---

Nitpick comments:
In `@packages/nmp_common/src/nmp/common/entities/values.py`:
- Around line 274-310: `NumberFilter` duplicates most of `FloatFilter` and risks
the two numeric filter models drifting apart. Refactor the filter types so there
is a single source of truth for numeric comparisons, either by moving
`$gt`/`$lt`/`$eq` into `FloatFilter` or by making `NumberFilter` inherit/compose
from `FloatFilter`; update the `NumberFilter` and `FloatFilter` definitions in
`values.py` so their shared behavior lives in one place and their aliases/config
stay consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6359daae-f76d-4371-af1a-b831e0bf4a36

📥 Commits

Reviewing files that changed from the base of the PR and between b4473e8 and a965d84.

⛔ Files ignored due to path filters (1)

web/packages/sdk/generated/agents/schema/DeploymentLogsResponse.ts is excluded by !**/generated/**

📒 Files selected for processing (8)

openapi/ga/individual/platform.openapi.yaml
openapi/ga/openapi.yaml
openapi/openapi.yaml
packages/nmp_common/src/nmp/common/entities/values.py
services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
services/intake/src/nmp/intake/api/v2/experiments/schemas.py
services/intake/tests/integration/spans/test_experiment_metric_sort.py
services/intake/tests/test_experiment_metric_filter.py

github-actions · 2026-06-25T00:27:33Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	21936/28740	76.3%	61.0%
Integration Tests	12599/27420	46.0%	19.3%

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

feat(experiments): Filter experiments list by a rollup metric

a965d84

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

shanaiabuggy requested review from a team as code owners June 25, 2026 00:17

github-actions Bot added the feat label Jun 25, 2026

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread openapi/ga/individual/platform.openapi.yaml Outdated

Comment thread openapi/ga/individual/platform.openapi.yaml

Comment thread openapi/ga/openapi.yaml Outdated

Comment thread openapi/ga/openapi.yaml

Comment thread services/intake/src/nmp/intake/api/v2/experiments/endpoints.py

shanaiabuggy added 3 commits June 24, 2026 20:42

bunny

139fc9e

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

lint

65755bb

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

Merge branch 'main' into sbuggy/ase-321

8594819

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

BrianNewsom approved these changes Jun 29, 2026

View reviewed changes

shanaiabuggy added 3 commits June 29, 2026 14:18

review

0b8d5ff

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

unlint

2508394

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

lint

a77d572

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

shanaiabuggy enabled auto-merge June 29, 2026 21:13

shanaiabuggy added 3 commits June 29, 2026 15:24

pls lint

9dfc8f2

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

Merge branch 'main' into sbuggy/ase-321

705632b

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

dear god pls

1e7714d

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>

shanaiabuggy added this pull request to the merge queue Jun 29, 2026

Merged via the queue into main with commit 54c738f Jun 29, 2026
52 checks passed

shanaiabuggy deleted the sbuggy/ase-321 branch June 29, 2026 22:11

coderabbitai Bot mentioned this pull request Jun 29, 2026

feat(experiments): Experiment group 'ranked fields' (default sort + basis for rank) #512

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(experiments): Filter experiments list by a rollup metric#453

feat(experiments): Filter experiments list by a rollup metric#453
shanaiabuggy merged 10 commits into
mainfrom
sbuggy/ase-321

shanaiabuggy commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

shanaiabuggy commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Behavior

Tests

Summary by CodeRabbit

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shanaiabuggy commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading