Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/development/debugging.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# How To Debug

## Controlling Log Level

For a full reference on log-level controls — including which pods are affected, YAML snippets for every component, and advanced `RUST_LOG` filtering for data-plane pods — see [Log Levels](../user-guide/reference/configuration/log-levels.md).

## Debug Logs

To enable debug logs in a Vertex Pod, set environment variable `NUMAFLOW_DEBUG` to `true` for the Vertex. For example:

```yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
# Environment Variables

For the `numa` container of vertex pods, environment variable `NUMAFLOW_DEBUG` can be set to `true` for [debugging](../../../development/debugging.md).
## Log level control

Numaflow exposes three env vars for controlling log verbosity across its pods:

- `NUMAFLOW_LOG_LEVEL` — sets the log level (`debug`, `info`, `warn`, `error`) for Numaflow-owned components. Overrides the level implied by `NUMAFLOW_DEBUG`.
- `RUST_LOG` — advanced override for data-plane pods (vertex `numa` container, MonoVertex `numa` container, serving pods). Accepts standard [`tracing-subscriber` EnvFilter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html) (e.g. `warn`, `numaflow_core=debug,info`) and takes precedence over `NUMAFLOW_LOG_LEVEL`.
- `NUMAFLOW_DEBUG` — development shortcut; sets level to `debug` and may switch log output from JSON to human-readable text. **Note:** the format change may break log shippers expecting JSON — prefer `NUMAFLOW_LOG_LEVEL` when only the level needs changing.

See [Log Levels](log-levels.md) for a full pod inventory, per-component YAML examples, and common recipes.

In [`udf`](../../user-defined-functions/map/map.md), [`udsink`](../../sinks/user-defined-sinks.md) and [`transformer`](../../sources/transformer/overview.md) containers, there are some preset environment variables that can be used directly.

Expand Down
185 changes: 185 additions & 0 deletions docs/user-guide/reference/configuration/log-levels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Log Levels

Numaflow-owned pods use `NUMAFLOW_LOG_LEVEL` as the standard log-level control. Data-plane pods also support `RUST_LOG` for advanced filtering.

## Quick reference

| Pod / container | Standard log-level env var | Default level |
|---|---|---|
| Pipeline daemon | `NUMAFLOW_LOG_LEVEL` | `info` |
| MonoVertex daemon | `NUMAFLOW_LOG_LEVEL` | `info` |
| ISB svc create / delete job | `NUMAFLOW_LOG_LEVEL` | `info` |
| ISB svc validate (init container) | `NUMAFLOW_LOG_LEVEL` | `info` |
| Controller (`numaflow-controller`) | `NUMAFLOW_LOG_LEVEL` | `info` |
| Webhook (`numaflow-webhook`) | `NUMAFLOW_LOG_LEVEL` | `info` |
| UX server (`numaflow-server`) | `NUMAFLOW_LOG_LEVEL` | `info` |
| Pipeline vertex `numa` container | `NUMAFLOW_LOG_LEVEL` | `info` |
| MonoVertex `numa` container | `NUMAFLOW_LOG_LEVEL` | `info` |
| Serving pod | `NUMAFLOW_LOG_LEVEL` | `info` |
| InterStepBufferService (JetStream / Redis) | n/a | n/a |

`NUMAFLOW_DEBUG=true` is also supported as a development shortcut (see [below](#numaflow_debug-interaction)).

---

## Standard log levels — `NUMAFLOW_LOG_LEVEL`

Numaflow-owned components read `NUMAFLOW_LOG_LEVEL` at startup.

**Accepted values:** `debug`, `info`, `warn`, `error`.

**Default:** `info`

**Precedence:** `NUMAFLOW_LOG_LEVEL` overrides the level implied by `NUMAFLOW_DEBUG`. Invalid values fall back to the level selected by `NUMAFLOW_DEBUG` or the default. For data-plane pods, `RUST_LOG` takes precedence over `NUMAFLOW_LOG_LEVEL` when set.

### Pipeline daemon pod

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
kind: Pipeline
spec:
templates:
daemon:
containerTemplate:
env:
- name: NUMAFLOW_LOG_LEVEL
value: warn
```

### MonoVertex daemon pod

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
kind: MonoVertex
spec:
daemonTemplate:
containerTemplate:
env:
- name: NUMAFLOW_LOG_LEVEL
value: warn
Comment on lines +58 to +59

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just use DAEMON_LOG_LEVEL and not NUMAFLOW_LOG_LEVEL

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explicitly mention CONTROLLER_LOG_LEVEL(daemon, controller etc) and DATAPLANE_LOG_LEVEL (should work like RUST_LOG)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a single variable would be better, like NUMAFLOW_LOG_LEVEL.
We won't need to use DAEMON_LOG and NUMAFLOW_LOG at the same time for a single container anyway right?

Even if there is a situation like this, what do you think about treating them like components/modules and use Rust's log/tracing syntax:

NUMAFLOW_LOG_LEVEL=debug,builtin_udf=warn

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, lets go with one single log level env.

```

### Controller, webhook, and UX server

These are cluster-level components deployed via the install manifests. Set `NUMAFLOW_LOG_LEVEL` directly on the relevant Deployment:

```yaml
# numaflow-controller Deployment
env:
- name: NUMAFLOW_LOG_LEVEL
value: warn
```

### ISB service jobs

Init and finalizer jobs for pipeline ISB creation/deletion/validation:

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
kind: Pipeline
spec:
templates:
job:
containerTemplate:
env:
- name: NUMAFLOW_LOG_LEVEL
value: warn
```

---

## Pipeline, MonoVertex, and Serving pods

Pipeline vertex pods, MonoVertex pods, and Serving pods use `NUMAFLOW_LOG_LEVEL` for common log-level cases:

**Accepted values:** `debug`, `info`, `warn`, `error`.

**Default:** `info`

### Pipeline vertex pod

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
kind: Pipeline
spec:
vertices:
- name: my-vertex
containerTemplate:
env:
- name: NUMAFLOW_LOG_LEVEL
value: warn
```

### MonoVertex pod

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
kind: MonoVertex
spec:
containerTemplate:
env:
- name: NUMAFLOW_LOG_LEVEL
value: warn
```

### Serving pod

```yaml
apiVersion: numaflow.numaproj.io/v1alpha1
kind: ServingPipeline
spec:
serving:
containerTemplate:
env:
- name: NUMAFLOW_LOG_LEVEL
value: warn
```

### Advanced data-plane filtering — `RUST_LOG`

Data-plane pods also support standard [`tracing-subscriber` EnvFilter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html) via `RUST_LOG`. Use this only when you need fine-grained filtering. When `RUST_LOG` is set, it takes precedence over `NUMAFLOW_LOG_LEVEL`.

For example, to enable debug logs for a specific target only:

```yaml
# Pipeline.spec.vertices[].containerTemplate.env
- name: RUST_LOG
value: "numaflow_core=debug,info"
```

---

## `NUMAFLOW_DEBUG` interaction

`NUMAFLOW_DEBUG=true` is a development shortcut. It lowers the default log level to `debug` and may switch log output from structured JSON to human-readable text.

**Important:** switching from JSON to text format may break log shippers or aggregators that expect structured JSON. Prefer `NUMAFLOW_LOG_LEVEL=debug` to lower the level without changing the output format.

`NUMAFLOW_LOG_LEVEL` overrides the level implied by `NUMAFLOW_DEBUG` without changing the format selected by `NUMAFLOW_DEBUG`. For data-plane pods, `RUST_LOG` takes precedence over `NUMAFLOW_LOG_LEVEL` when set.

---

## Common recipes

**Suppress idle-rater info noise on a MonoVertex daemon:**
```yaml
# MonoVertex.spec.daemonTemplate.containerTemplate.env
- name: NUMAFLOW_LOG_LEVEL
value: warn
```

**Enable debug for a single data-plane target without flooding all logs:**
```yaml
# Pipeline.spec.vertices[].containerTemplate.env
- name: RUST_LOG
value: "numaflow_core=debug,info"
```

**Enable full debug on a vertex pod:**
```yaml
# Pipeline.spec.vertices[].containerTemplate.env
- name: NUMAFLOW_DEBUG
value: "true"
# Note: this also switches log output from JSON to text.
# To keep JSON format while lowering the level, use NUMAFLOW_LOG_LEVEL=debug instead.
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ nav:
- user-guide/reference/configuration/container-resources.md
- user-guide/reference/configuration/volumes.md
- user-guide/reference/configuration/environment-variables.md
- user-guide/reference/configuration/log-levels.md
- user-guide/reference/configuration/labels-and-annotations.md
- user-guide/reference/configuration/init-containers.md
- user-guide/reference/configuration/sidecar-containers.md
Expand Down
1 change: 1 addition & 0 deletions pkg/apis/numaflow/v1alpha1/const.go
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ const (
EnvLeaderElectionLeaseRenewPeriod = "NUMAFLOW_LEADER_ELECTION_LEASE_RENEW_PERIOD"
EnvUDContainerType = "NUMAFLOW_UD_CONTAINER_TYPE"
EnvDebug = "NUMAFLOW_DEBUG"
EnvLogLevel = "NUMAFLOW_LOG_LEVEL"
EnvPPROF = "NUMAFLOW_PPROF"
EnvHealthCheckDisabled = "NUMAFLOW_HEALTH_CHECK_DISABLED"
EnvGRPCMaxMessageSize = "NUMAFLOW_GRPC_MAX_MESSAGE_SIZE"
Expand Down
10 changes: 8 additions & 2 deletions pkg/mvtxdaemon/server/service/rater/rater.go
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,11 @@ func (r *Rater) getPodMetrics(podName string) map[string]*dto.MetricFamily {
func (r *Rater) getPodReadCounts(podName string, result map[string]*dto.MetricFamily) *PodMetricsCount {
value, ok := result[monoVtxReadMetricName]
if !ok || value == nil || len(value.GetMetric()) == 0 {
r.log.Infof("[Pod name %s]: Metric %q is unavailable, the pod might haven't started processing data", podName, monoVtxReadMetricName)
// Logged at debug because this fires on every rater tick when the queue is idle (no messages
// ever read since pod start), since the Rust prometheus client only registers a counter after
// it is first incremented.
// To suppress, set NUMAFLOW_LOG_LEVEL=warn on the daemon container.
r.log.Debugf("[Pod name %s]: Metric %q is unavailable, the pod might haven't started processing data", podName, monoVtxReadMetricName)
return nil
}

Expand All @@ -293,7 +297,9 @@ func (r *Rater) getPodPendingCounts(podName string, result map[string]*dto.Metri
podPendingCount := &PodMetricsCount{podName, metricsList[0].Gauge.GetValue()}
return podPendingCount
} else {
r.log.Infof("[Pod name %s]: Metric %q is unavailable, the pod might haven't started processing data", podName, monoVtxPendingRawMetric)
// Same rationale as getPodReadCounts: gauge may not be emitted yet on an idle pod.
// To suppress, set NUMAFLOW_LOG_LEVEL=warn on the daemon container.
r.log.Debugf("[Pod name %s]: Metric %q is unavailable, the pod might haven't started processing data", podName, monoVtxPendingRawMetric)
}
return nil
}
Expand Down
40 changes: 37 additions & 3 deletions pkg/shared/logging/log.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,41 @@ package logging

import (
"context"
"fmt"
"os"
"strings"

zap "go.uber.org/zap"
"go.uber.org/zap/zapcore"
)

// NewLogger returns a new zap.SugaredLogger
const (
envDebug = "NUMAFLOW_DEBUG"
envLogLevel = "NUMAFLOW_LOG_LEVEL"
)

// NewLogger returns a new zap.SugaredLogger.
// Log level can be overridden at runtime via the NUMAFLOW_LOG_LEVEL env var
// (accepted values: debug, info, warn, error).
// NUMAFLOW_DEBUG=true selects the development preset (console encoder, debug level).
// NUMAFLOW_LOG_LEVEL overrides the level chosen by NUMAFLOW_DEBUG.
func NewLogger() *zap.SugaredLogger {
var config zap.Config
debugMode, ok := os.LookupEnv("NUMAFLOW_DEBUG")
debugMode, ok := os.LookupEnv(envDebug)
if ok && debugMode == "true" {
config = zap.NewDevelopmentConfig()
} else {
config = zap.NewProductionConfig()
}
// Config customization goes here if any
// NUMAFLOW_LOG_LEVEL overrides the level set by the preset above.
// Invalid values fall back to the preset level so a typo does not crash the pod.
if lvlStr, ok := os.LookupEnv(envLogLevel); ok && strings.TrimSpace(lvlStr) != "" {
if lvl, ok := parseLogLevel(lvlStr); ok {
config.Level = zap.NewAtomicLevelAt(lvl)
} else {
_, _ = fmt.Fprintf(os.Stderr, "invalid %s=%q, using default log level\n", envLogLevel, lvlStr)
}
}
config.EncoderConfig.EncodeTime = zapcore.RFC3339NanoTimeEncoder
config.OutputPaths = []string{"stdout"}
logger, err := config.Build()
Expand All @@ -43,6 +62,21 @@ func NewLogger() *zap.SugaredLogger {
return logger.Named("numaflow").Sugar()
}

func parseLogLevel(level string) (zapcore.Level, bool) {
switch strings.ToLower(strings.TrimSpace(level)) {
case "debug":
return zapcore.DebugLevel, true
case "info":
return zapcore.InfoLevel, true
case "warn":
return zapcore.WarnLevel, true
case "error":
return zapcore.ErrorLevel, true
default:
return zapcore.InfoLevel, false
}
}

type loggerKey struct{}

// WithLogger returns a copy of parent context in which the
Expand Down
Loading
Loading