vllm-project · mergify · Jun 15, 2026 · Jun 15, 2026 · Jun 15, 2026 · Jun 16, 2026
diff --git a/docs/getting-started/benchmark.md b/docs/getting-started/benchmark.md
@@ -195,7 +195,7 @@ guidellm run --profile kind=sweep,sweep_size=10,rampup_duration=10,strategy_type
 
 #### Replay Profile
 
-Replays trace events using timestamps from a `trace_synthetic` dataset. See [Trace Replay Benchmarking](#trace-replay-benchmarking-beta) below for data setup.
+Replays trace events using timestamps from a trace file dataset. See [Trace Replay Benchmarking](#trace-replay-benchmarking) below for data setup.
 
 ```bash
 guidellm run --profile kind=replay,time_scale=1.0
@@ -225,9 +225,9 @@ guidellm run \
 
 You can customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values. See the [Datasets Synthetic data documentation](../guides/datasets.md#synthetic-data) for more details.
 
-### Trace Replay Benchmarking (beta)
+### Trace Replay Benchmarking
 
-For realistic load testing, replay trace events using each row's timestamp and token lengths. Trace files must be JSONL and are loaded with the `trace_synthetic` data type. By default, each row uses `timestamp`, `input_length`, and `output_length` fields. Timestamps may be absolute or monotonic values; GuideLLM sorts them and converts them to offsets from the first event before scheduling:
+For realistic load testing, replay trace events using each row's timestamp and token lengths. Trace files must be JSONL, JSON, CSV, or Parquet and are loaded with a supported [trace file format](../guides/trace_replay.md#supported-formats). Timestamps may be absolute or monotonic values; GuideLLM sorts them and converts them to offsets from the first event before scheduling:
 
 ```json
 {"timestamp": 1234500.0, "input_length": 256, "output_length": 128}
@@ -249,7 +249,7 @@ The replay profile parameter `time_scale` acts as a scaling factor for the inter
 
 GuideLLM orders trace rows by timestamp before scheduling and payload generation, so each scheduled event uses the token lengths from the same sorted row. Use `--data-loader kind=pytorch,samples=1000` to limit how many trace rows are loaded and replayed. `--constraint kind=max_requests,count=1000` remains a runtime completion constraint; it does not truncate the trace dataset.
 
-If your trace uses different column names, include `timestamp_column`, `prompt_tokens_column`, and `output_tokens_column` in the data config:
+Every format by default looks for the columns "timestamp", "input_length", and "output_length". If your trace uses different column names, include `timestamp_column`, `prompt_tokens_column`, and `output_tokens_column` in the data config:
 
 ```bash
 guidellm run \
@@ -258,7 +258,7 @@ guidellm run \
   --profile kind=replay,time_scale=1.0
 ```
 
-For very small prompts (roughly under 15 tokens, depending on the tokenizer), GuideLLM may not have enough room to include the full per-row unique prefix. Different rows can then produce similar or identical prompts, which reduces cache resistance in replay benchmarks.
+This functionality extends to columns required by specific formats. These additional columns and other format-specific arguments are described in the [Trace File Formats documentation](../guides/trace_replay.md)
 
 ### Working with Real Data
 

diff --git a/docs/guides/datasets.md b/docs/guides/datasets.md
@@ -20,11 +20,11 @@ The following arguments configure datasets and their processing:
   - `synthetic_text` — generates synthetic prompts on the fly. Required field: `prompt_tokens`. Optional: `output_tokens`, `turns`, `prefix_tokens`, `prefix_count`, `prefix_buckets`, and distribution controls (`prompt_tokens_stdev`, `output_tokens_stdev`, etc.).
   - `huggingface` (alias `hf`) — loads from HuggingFace Hub or a local directory/file. Required field: `source` (dataset ID or path). Pass dataset loading arguments (for example `split`, `name`) via `load_kwargs`.
   - `json_file`, `csv_file`, `text_file`, `parquet_file`, `arrow_file`, `hdf5_file`, `db_file`, `tar_file` — loads from a local file. Required field: `path`.
-  - `trace_synthetic` — loads a JSONL trace file for replay benchmarking. Required field: `path`. Optional: `timestamp_column` (default: `timestamp`), `prompt_tokens_column` (default: `input_length`), `output_tokens_column` (default: `output_length`).
+  - `trace_synthetic`, `mooncake` — loads a JSONL, JSON, CSV, or Parquet trace file for replay benchmarking. Required field: `path`. Optional: `timestamp_column` (default: `timestamp`), `prompt_tokens_column` (default: `input_length`), `output_tokens_column` (default: `output_length`).
 
-In addition, you can specify additional arguments to the dataset loading with the data argument `loader_kwargs`:
+In addition, you can specify additional arguments to the dataset loading with the data argument `load_kwargs`:
 
-- loader_kwargs: Additional arguments to the dataset loading. For example, dataset splits can be specified with `--data '{"kind":"huggingface","source":"my/dataset","loader_kwargs":{"split":"test"}}'`.
+- load_kwargs: Additional arguments to the dataset loading. For example, dataset splits can be specified with `--data '{"kind":"huggingface","source":"my/dataset","load_kwargs":{"split":"test"}}'`.
 
 ### Data Loader
 
@@ -188,7 +188,7 @@ GuideLLM supports various file formats for datasets, including text, CSV, JSON,
   {"prompt": "What is your name?", "output_tokens_count": 3, "additional_column": "baz", "additional_column2": "qux"}
   ```
 
-- **Trace files (`.jsonl` with `trace_synthetic` type)**: Specialized JSONL files for replay benchmarking with `timestamp`, `input_length`, and `output_length` fields. Used with `--profile kind=replay` to replay trace events using each row's timestamp and token lengths. Timestamps must be numbers expressed in seconds on a shared timeline with any consistent zero point; GuideLLM sorts them and converts them to offsets from the first event before scheduling. Date strings are not parsed yet, so provide timestamps as numbers. See [Trace Replay Benchmarking](../getting-started/benchmark.md#trace-replay-benchmarking-beta).
+- **Trace files (`.jsonl`, `.json`, `.csv` or `.parquet` with a supported trace file format)**: Specialized files for replay. Used with `--profile kind=replay` to replay trace events using each row's timestamp and token lengths. Timestamps must be numbers expressed in seconds on a shared timeline with any consistent zero point; GuideLLM sorts them and converts them to offsets from the first event before scheduling. Date strings are not parsed yet, so provide timestamps as numbers. See [Trace Replay Benchmarking](../getting-started/benchmark.md#trace-replay-benchmarking).
 
   ```json
   {"timestamp": 1234500.0, "input_length": 256, "output_length": 128}
@@ -197,7 +197,7 @@ GuideLLM supports various file formats for datasets, including text, CSV, JSON,
 
   In this example, the second request is scheduled 0.5 seconds after the first request. Trace rows are ordered by timestamp before GuideLLM schedules requests and generates synthetic payloads. This keeps each scheduled event aligned with the prompt and output token lengths from the same row.
 
-  Use `trace_synthetic` to enable trace loading:
+  Use a supported [trace file format](./trace_replay.md#supported-formats) to enable trace loading:
 
   ```bash
   guidellm run \
@@ -206,7 +206,7 @@ GuideLLM supports various file formats for datasets, including text, CSV, JSON,
     --data kind=trace_synthetic,path=path/to/trace.jsonl
   ```
 
-  If your trace uses different column names, include `timestamp_column`, `prompt_tokens_column`, and `output_tokens_column` in the data config:
+  All trace formats by default look for the columns "timestamp", "input_length", and "output_length". If your trace uses different column names, include `timestamp_column`, `prompt_tokens_column`, and `output_tokens_column` in the data config:
 
   ```bash
   guidellm run \
@@ -217,8 +217,6 @@ GuideLLM supports various file formats for datasets, including text, CSV, JSON,
 
   For replay, `time_scale` on the profile is a time scale for the intervals between trace events rather than requests per second. Use `--data-loader kind=pytorch,samples=1000` to limit how many trace rows are loaded and replayed. Use `--constraint kind=max_requests,count=<n>` only as a runtime completion constraint; it does not limit the trace rows loaded from the file.
 
-  Very small `input_length` values (roughly under 15 tokens, depending on the tokenizer) may not leave enough room for the full per-row unique prefix in the synthetic prompt. This can make prompts more similar across rows and weaken cache resistance. See [Trace Replay Benchmarking](../getting-started/benchmark.md#trace-replay-benchmarking) for details.
-
 - **JSON files (`.json`)**: Where the entire dataset is represented as a JSON array of objects nested under a specific key. To surface the correct key to use, a `--data-column-mapper` argument must be passed in of `"field": "NAME"` for where the array exists. The objects should include `prompt` or other common names for the prompt which will be used as the prompt column. Additional fields can be included based on the previously mentioned aliases for the `--data-column-mapper` argument.
 
   ```json

diff --git a/docs/guides/trace_replay.md b/docs/guides/trace_replay.md
@@ -0,0 +1,44 @@
+# Trace File Formats
+
+Many trace files are formatted in ways that need to be specially handled to create an accurate replay. This guide covers all trace file formats currently supported by GuideLLM, along with the format-agnostic and format-specific data arguments.
+
+Detailed use of the replay profile and file-based datasets as a whole is explained in [Trace Replay Benchmarking](../getting-started/benchmark.md#trace-replay-benchmarking).
+
+## Supported Formats
+
+These are passed to the `--data` argument as `kind=format`:
+
+- `trace_synthetic`: A trace format that does the bare minimum needed to complete a fully functioning trace replay benchmark with synthetic prompt generation
+- `mooncake`: The trace format used by the serving platform Mooncake, as defined in [https://doi.org/10.48550/arXiv.2407.00079](https://doi.org/10.48550/arXiv.2407.00079)
+
+## Format-Agnostic Data Arguments
+
+All trace formats can accept the following optional data arguments:
+
+| Argument               | Default         | Description                                           |
+| ---------------------- | --------------- | ----------------------------------------------------- |
+| `timestamp_column`     | "timestamp"     | Column name for timestamps in the trace file          |
+| `prompt_tokens_column` | "input_length"  | Column name for prompt token counts in the trace file |
+| `output_tokens_column` | "output_length" | Column name for output token counts in the trace file |
+
+These are passed through the `--data` argument like below:
+
+```bash
+guidellm benchmark \
+    --target http://localhost:8000 \
+    --profile kind=replay \
+    --data "kind=trace_synthetic,path=replay.jsonl,timestamp_column=ts,prompt_tokens_column=input_tokens,output_tokens_column=generated_tokens"
+```
+
+`trace_synthetic` can be thought of as the format-agnostic option, only looking for the timestamp, prompt token count and output token count columns and ignoring all other features contained in a dataset. While primarily used for testing, `trace_synthetic` may be used as a fallback for trace formats not currently supported by GuideLLM.
+
+## Format-Specific Data Arguments
+
+### `mooncake`
+
+The Mooncake format expects an additional column for hash IDs. During prompt generation, hash IDs sharing the same previous ID are required to represent dinstinct blocks of token ids.
+
+| Argument             | Default    | Description                                         |
+| -------------------- | ---------- | --------------------------------------------------- |
+| `hash_ids_column`    | "hash_ids" | Column name for lists of hash IDs in the trace file |
+| `hash_id_block_size` | 512        | Amount of tokens represented by one hash ID         |
diff --git a/src/guidellm/data/deserializers/__init__.py b/src/guidellm/data/deserializers/__init__.py
@@ -28,8 +28,16 @@
     SyntheticTextDataset,
     SyntheticTextDatasetDeserializer,
 )
-from .trace_mooncake import TraceMooncakeDataArgs, TraceMooncakeDatasetDeserializer
-from .trace_synthetic import TraceSyntheticDataArgs, TraceSyntheticDatasetDeserializer
+from .trace_common import (
+    TraceDataArgs,
+    TraceDatasetDeserializer,
+    TraceFormatBase,
+    TraceFormatRegistry,
+    decode_prompt,
+    generate_token_ids,
+)
+from .trace_minimal import MinimalTraceFormatArgs
+from .trace_mooncake import MooncakeTraceFormatArgs
 
 __all__ = [
     "ArrowFileDatasetDeserializer",
@@ -49,14 +57,18 @@
     "InMemoryItemListDataArgs",
     "InMemoryItemListDatasetDeserializer",
     "JSONFileDatasetDeserializer",
+    "MinimalTraceFormatArgs",
+    "MooncakeTraceFormatArgs",
     "ParquetFileDatasetDeserializer",
     "SyntheticTextDataArgs",
     "SyntheticTextDataset",
     "SyntheticTextDatasetDeserializer",
     "TarFileDatasetDeserializer",
     "TextFileDatasetDeserializer",
-    "TraceMooncakeDataArgs",
-    "TraceMooncakeDatasetDeserializer",
-    "TraceSyntheticDataArgs",
-    "TraceSyntheticDatasetDeserializer",
+    "TraceDataArgs",
+    "TraceDatasetDeserializer",
+    "TraceFormatBase",
+    "TraceFormatRegistry",
+    "decode_prompt",
+    "generate_token_ids",
 ]