Skip to content

feat(experiments): Filter experiments list by a rollup metric#453

Merged
shanaiabuggy merged 10 commits into
mainfrom
sbuggy/ase-321
Jun 29, 2026
Merged

feat(experiments): Filter experiments list by a rollup metric#453
shanaiabuggy merged 10 commits into
mainfrom
sbuggy/ase-321

Conversation

@shanaiabuggy

@shanaiabuggy shanaiabuggy commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What

Adds metric filtering to the experiments list — filter by rollup metrics (cost, latency, evaluator scores, run count), not just entity fields.

How

Reuses the platform's standard filter[field][op] bracket syntax, so it combines naturally with existing entity filters:

?filter[cost_usd.mean][$lte]=0.5&filter[run_count][$gte]=1&filter[experiment_group_id]=

  • Supported paths mirror the sort grammar: run_count, cost_usd.<stat>, latency_ms.<stat>, evaluators.<name>.<stat> (stat ∈ mean/median/p90/p95/p99/sum/count). Operators: $gte/$lte/$gt/$lt/$eq.
  • Metrics live in ClickHouse, not Postgres, so list_experiments splits the filter tree: entity predicates go to the entity store; metric predicates are applied in-app after rollup hydration (compute-on-read, same plumbing as metric sort). Declared via self-mapping namespaces on ExperimentFilter so paths pass validation untranslated.
  • Added a NumberFilter range type ($gte/$lte/$gt/$lt/$eq) alongside DatetimeFilter/StringFilter.

Behavior

  • Metric filters must be AND-combined with entity filters (nested ANDs flatten); a metric under OR/NOT → 400 (can't split a boolean tree across two stores).
  • 400 unsupported metric/stat/operator; 413 result set over the in-memory bound; 503 when ClickHouse is unavailable for a metric filter. Missing metric never matches.

Tests

Unit tests for the split/validate/match helpers + endpoint wiring (validation, 400/503), and an integration test combining entity + metric filters end to end against ClickHouse. OpenAPI specs regenerated.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Enhanced experiments listing to support numeric comparisons on metric rollups, including run_count, cost_usd, latency_ms, and evaluator metrics.
    • Added rollup-stat filtering for supported stats (e.g., mean, median, p90, p95, p99, sum, count) with operators like $eq, $gt, $gte, $lt, $lte.
  • Bug Fixes

    • Unsupported sort or filter fields now return clearer 400 errors.
    • Metric-based sorting/filtering now returns 503 when telemetry data for rollups is unavailable.
  • Documentation

    • Updated OpenAPI specs and examples for metric rollup filtering.
  • Tests

    • Added integration and unit test coverage for valid and invalid metric filters.

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
@shanaiabuggy shanaiabuggy requested review from a team as code owners June 25, 2026 00:17
@github-actions github-actions Bot added the feat label Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 25a7b1d9-aef8-4ed7-849a-b67b7481894d

📥 Commits

Reviewing files that changed from the base of the PR and between 139fc9e and 0b8d5ff.

⛔ Files ignored due to path filters (1)
  • sdk/stainless.yaml is excluded by !sdk/**
📒 Files selected for processing (5)
  • openapi/ga/individual/platform.openapi.yaml
  • openapi/ga/openapi.yaml
  • openapi/openapi.yaml
  • services/intake/src/nmp/intake/api/v2/experiments/schemas.py
  • services/intake/tests/test_experiment_metric_filter.py
💤 Files with no reviewable changes (2)
  • services/intake/tests/test_experiment_metric_filter.py
  • services/intake/src/nmp/intake/api/v2/experiments/schemas.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • openapi/ga/openapi.yaml

📝 Walkthrough

Walkthrough

Adds numeric rollup filtering to the experiments list API. OpenAPI, shared filter types, endpoint handling, and tests now cover run_count, cost_usd, latency_ms, and evaluators comparisons.

Changes

Metric Rollup Filters

Layer / File(s) Summary
Endpoint docs and error text
openapi/openapi.yaml, openapi/ga/openapi.yaml, openapi/ga/individual/platform.openapi.yaml, services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
filter docs add rollup-metric examples; 400 and 503 text now mention unsupported sort/filter fields and metric-based sort/filter.
Filter contract and schema shapes
openapi/openapi.yaml, openapi/ga/openapi.yaml, openapi/ga/individual/platform.openapi.yaml, packages/nmp_common/src/nmp/common/entities/values.py, services/intake/src/nmp/intake/api/v2/experiments/schemas.py
ExperimentFilter gains rollup-metric fields, and NumberFilter adds $gte, $lte, $gt, $lt, and $eq operators.
Metric filter handling in list_experiments
services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
list_experiments splits entity filters from metric predicates, validates metric paths and numeric operators, hydrates rollups, and applies metric predicates before sorting and pagination.
Metric filter tests
services/intake/tests/test_experiment_metric_filter.py, services/intake/tests/integration/spans/test_experiment_metric_sort.py
Unit tests cover filter extraction and validation, and integration tests cover combined metric filtering in the experiments list response.

Possibly related PRs

Suggested reviewers

  • BrianNewsom
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding rollup-metric filtering to the experiments list.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sbuggy/ase-321

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
packages/nmp_common/src/nmp/common/entities/values.py (1)

274-310: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

NumberFilter overlaps FloatFilter.

FloatFilter already provides $gte/$lte; NumberFilter adds $gt/$lt/$eq. Consider folding the extra operators into FloatFilter (or deriving one from the other) to avoid two near-identical numeric filters drifting apart.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nmp_common/src/nmp/common/entities/values.py` around lines 274 -
310, `NumberFilter` duplicates most of `FloatFilter` and risks the two numeric
filter models drifting apart. Refactor the filter types so there is a single
source of truth for numeric comparisons, either by moving `$gt`/`$lt`/`$eq` into
`FloatFilter` or by making `NumberFilter` inherit/compose from `FloatFilter`;
update the `NumberFilter` and `FloatFilter` definitions in `values.py` so their
shared behavior lives in one place and their aliases/config stay consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openapi/ga/individual/platform.openapi.yaml`:
- Around line 3738-3740: The filter examples in the OpenAPI docs use unprefixed
operators that do not match the schema. Update the example text in the
`NumberFilter`/rollup metric descriptions so the keys match the defined query
shape (`$gte`, `$lte`, `$gt`, `$lt`, `$eq`) everywhere this example appears,
including the related `NumberFilter` documentation block. Keep the surrounding
example values the same, but ensure the operator names in the docs are
consistent with the schema.
- Around line 14255-14279: The NumberFilter schema currently allows empty
objects because it lacks a minimum property constraint. Update the NumberFilter
definition in the openapi schema to require at least one predicate by adding
minProperties: 1 alongside the existing properties and additionalProperties:
false, so the schema still accepts $gte, $lte, $gt, $lt, or $eq but rejects {}.

In `@openapi/ga/openapi.yaml`:
- Around line 3734-3740: Update the filter documentation text in the OpenAPI
spec so the numeric range examples use the same $-prefixed operator keys defined
by the schema. In the affected description near the experiments filter section,
change the examples for run_count, cost_usd.mean, latency_ms.p95, and
evaluators.<name>.mean to use $gte/$lte/$gt/$lt/$eq consistently. Apply the same
wording cleanup anywhere the duplicated filter description appears so the
examples match the actual supported operators and do not point clients to
invalid keys.
- Around line 14255-14279: NumberFilter currently allows an empty object, so
update the NumberFilter schema in openapi/ga/openapi.yaml to require at least
one predicate operator. Add minProperties: 1 alongside the existing properties
definition so validation rejects {} while still allowing $gte, $lte, $gt, $lt,
or $eq. Use the NumberFilter schema block to locate the change.

In `@services/intake/src/nmp/intake/api/v2/experiments/endpoints.py`:
- Around line 992-1016: The metric filter validation in the LogicalOperation
handling is rejecting nested AND groups because
`_operation_references_metric(child)` treats a child AND containing metrics as
invalid, even though the parent combinator is already AND. Update the logic
around `LogicalOperation`, `_operation_references_metric`, and
`_validated_metric_predicate` to either flatten nested ANDs before validation or
explicitly recurse through AND children so metric comparisons inside sub-ANDs
are accepted; if nested ANDs remain unsupported, adjust the HTTPException detail
to clearly state that only flat metric comparisons are allowed.

---

Nitpick comments:
In `@packages/nmp_common/src/nmp/common/entities/values.py`:
- Around line 274-310: `NumberFilter` duplicates most of `FloatFilter` and risks
the two numeric filter models drifting apart. Refactor the filter types so there
is a single source of truth for numeric comparisons, either by moving
`$gt`/`$lt`/`$eq` into `FloatFilter` or by making `NumberFilter` inherit/compose
from `FloatFilter`; update the `NumberFilter` and `FloatFilter` definitions in
`values.py` so their shared behavior lives in one place and their aliases/config
stay consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6359daae-f76d-4371-af1a-b831e0bf4a36

📥 Commits

Reviewing files that changed from the base of the PR and between b4473e8 and a965d84.

⛔ Files ignored due to path filters (1)
  • web/packages/sdk/generated/agents/schema/DeploymentLogsResponse.ts is excluded by !**/generated/**
📒 Files selected for processing (8)
  • openapi/ga/individual/platform.openapi.yaml
  • openapi/ga/openapi.yaml
  • openapi/openapi.yaml
  • packages/nmp_common/src/nmp/common/entities/values.py
  • services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
  • services/intake/src/nmp/intake/api/v2/experiments/schemas.py
  • services/intake/tests/integration/spans/test_experiment_metric_sort.py
  • services/intake/tests/test_experiment_metric_filter.py

Comment thread openapi/ga/individual/platform.openapi.yaml Outdated
Comment thread openapi/ga/individual/platform.openapi.yaml
Comment thread openapi/ga/openapi.yaml Outdated
Comment thread openapi/ga/openapi.yaml
Comment thread services/intake/src/nmp/intake/api/v2/experiments/endpoints.py
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 21936/28740 76.3% 61.0%
Integration Tests 12599/27420 46.0% 19.3%

Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
@shanaiabuggy shanaiabuggy enabled auto-merge June 29, 2026 21:13
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
Signed-off-by: shanaiabuggy <59746633+shanaiabuggy@users.noreply.github.com>
@shanaiabuggy shanaiabuggy added this pull request to the merge queue Jun 29, 2026
Merged via the queue into main with commit 54c738f Jun 29, 2026
52 checks passed
@shanaiabuggy shanaiabuggy deleted the sbuggy/ase-321 branch June 29, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants