Skip to content

Trace File Refactor#829

Merged
mergify[bot] merged 37 commits into
vllm-project:mainfrom
SkiHatDuckie:trace-merge
Jun 30, 2026
Merged

Trace File Refactor#829
mergify[bot] merged 37 commits into
vllm-project:mainfrom
SkiHatDuckie:trace-merge

Conversation

@SkiHatDuckie

@SkiHatDuckie SkiHatDuckie commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

A refactoring of trace formats that separates format-agnostic trace replay functionality from format-specific functionality. Notably, all formats now work with the same dataset deserializer. Two abstract classes TraceDataArgs and TraceFormatBase are required to be implemented by all formats instead.

Additional documentation has been added to better cover all supported trace formats and their different requirements.

The unique prefixes for cache resistance found originally in trace_synthetic.py (now trace_minimal.py) was removed due to being incompatible with the new model. It may be re-added as a feature in future PRs through another means.

Details

  • Added trace_common.py
    • All trace formats use the same TraceDatasetDeserializer
    • Moved commonly used functions such as generate_token_ids and decode_prompt to trace_common.py
    • Added TraceDataArgs: an abstract class inherited by all formats
    • Added TraceFormatBase and TraceFormatRegistry: defines an interface for format-specific requirements and functionality on top of TraceExamplesIterable
  • Replaced TraceSyntheticDatasetDeserializer and TraceSyntheticDataArgs with MinimalTraceFormat and MinimalTraceFormatArgs
  • Replaced TraceMooncakeDatasetDeserializer and TraceMooncakeDataArgs with MooncakeTraceFormat and MooncakeTraceFormatArgs
  • Renamed trace_synthetic.py -> trace_minimal.py
  • Renamed test_trace_synthetic.py -> test_trace_minimal.py
  • Added test_trace_common.py, and rearranged preexisting tests accordingly
  • Updated test_replay_profile.py, test_trace_replay.py and test_trace_replay_multiprocess.py
  • Fixed a bug with Mooncake format not working with multiprocessing
  • All trace formats now work with IterableDataset for streaming
  • Added documentation trace_file_formats.md to cover all trace formats supported by GuideLLM
  • Updated documentation in getting_started/benchmark.md and guides/datasets.md
  • Updated inline documentation
  • Updated import registry in data/deserializers/__init__.py
  • Moved common dataset validation checks to load_trace_rows
  • Removed unique prefixes for cache-resistance in trace_minimal.py

Test Plan

  • tox -e test-unit
  • tox -e test-integration
  • tox -e lint-check && tox -e type-check

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes code generated or substantially modified by an AI agent
  • Includes tests generated or substantially modified by an AI agent

NOTE: the Generated-by or Assisted-by trailers should be used in git commit messages when code or tests were generated or substantially modified by an AI agent, as described in the project's DEVELOPING.md file.


git log

commit 027f439
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 15 12:16:03 2026 -0400

Move dataset validation to load_trace_rows

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 6ec92f5
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 15 13:00:51 2026 -0400

Rename `TraceColumn` in test file to `TraceColumnGenerator`

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 76f43df
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 15 15:11:25 2026 -0400

Add relative_timestamp column to deserialized dataset

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit d79be07
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 16 09:38:05 2026 -0400

Switch to streaming datasets for synthetic trace

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 00ccea7
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 16 09:58:18 2026 -0400

Update docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 7de32d6
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 16 16:58:54 2026 -0400

Add trace_common.py + classes

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 6f6e464
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 17 10:57:16 2026 -0400

Repair broken test files

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 9cd8729
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 17 16:34:10 2026 -0400

Instantiate/Validate/Dispatch formats through TraceFormatArgs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 26b740f
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 18 15:09:33 2026 -0400

Rework format handling; flatten data args for CLI

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 7911a87
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 18 16:16:15 2026 -0400

Repair tests

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 9aea389
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 18 16:18:38 2026 -0400

Remove TraceDataset from __all__

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 27687b2
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 18 16:25:29 2026 -0400

Move common funcs to trace_common

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 73b2eda
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 11:04:41 2026 -0400

Add test_trace_common.py and rearrange tests

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 2cdb3b8
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 11:15:03 2026 -0400

Refactor test_trace_synthetic

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 04626fe
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 11:28:35 2026 -0400

Rename trace_synthetic to trace_minimal

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 8f1ab50
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 11:49:30 2026 -0400

Improve text coverage

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit dff58aa
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 12:20:30 2026 -0400

Update inline docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 342bd0c
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 16:12:02 2026 -0400

Update docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 37f3b10
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 16:17:20 2026 -0400

Cleanup linting & docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 7bc2f96
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 24 10:07:04 2026 -0400

Spread `kind`ness

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit e0fe688
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 24 10:47:16 2026 -0400

Update docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 955a8f5
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 24 11:31:31 2026 -0400

Fix: Register formats with deserializer

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 8908132
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 24 11:35:31 2026 -0400

Satisfy linting

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit e344e80
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 25 13:00:26 2026 -0400

Move `timestamps` outside the loop

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 1e842a4
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 29 17:16:23 2026 -0400

Register formats w/ deserializer outside trace_common

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit f68217f
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 29 17:20:50 2026 -0400

Update TraceDataArgs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 73617d1
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 29 17:50:56 2026 -0400

Move trace_io contents to trace_common

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit d6f0e67
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 08:59:31 2026 -0400

Fix typo

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 9dd48e4
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 10:04:58 2026 -0400

Remove TraceColumn

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit f32a01e
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 10:17:55 2026 -0400

Specify bad path reason

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 5c51338
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 10:29:43 2026 -0400

Add comment to create_prompt

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 9edb490
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 10:35:10 2026 -0400

Re-register trace_minimal as trace_synthetic

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 9a2c7c1
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 12:50:13 2026 -0400

Support more filetypes + update docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 8346b53
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 12:57:04 2026 -0400

Rename trace_file_formats.md to trace_replay.md

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 2741e11
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 13:04:58 2026 -0400

Make margin_of_safety an optional parameter

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 5b955d1
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 13:07:47 2026 -0400

Update exception msgs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 00efe7b
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 30 13:10:04 2026 -0400

Update exception msgs x2

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

Signed-off-by: SkiHatDuckie SkiHatDuckie@gmail.com

@mergify

mergify Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Hi @SkiHatDuckie, the DCO check has failed. Please click on DCO in the Checks section for instructions on how to resolve this.

@SkiHatDuckie

Copy link
Copy Markdown
Contributor Author

Sorry, messed up the rebase. Give me a minute while I clean up the history.

@dbutenhof dbutenhof left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just logging a few doc comments I caught in a quick scan. I'll get to the code tomorrow morning...

Comment thread docs/getting-started/benchmark.md Outdated
Comment thread docs/getting-started/benchmark.md Outdated
Comment thread docs/getting-started/benchmark.md Outdated
Comment thread docs/guides/datasets.md Outdated
Comment thread docs/guides/datasets.md Outdated
Comment thread docs/guides/trace_file_formats.md Outdated
Comment thread docs/guides/trace_file_formats.md Outdated
Comment thread docs/guides/trace_file_formats.md Outdated
@mergify

mergify Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Hi @SkiHatDuckie, the DCO check has failed. Please click on DCO in the Checks section for instructions on how to resolve this.

@sjmonson

Copy link
Copy Markdown
Collaborator

Congrats on breaking the update-description job 😁

mergify Bot pushed a commit that referenced this pull request Jun 25, 2026
…#855)

## Summary
This is a separate PR for the bug fix contained in #829, if we instead wish to just get the bug fix in for v0.7.0. This will be closed if the trace file refactor is merged, or after the release of v0.7.0.

## Details
- Fixed a bug with Mooncake format not working with multiprocessing

## Related Issues
- This is also fixed with #829 

---

- [x] "I certify that all code in this PR is my own, except as noted below."

## Use of AI

- [ ] Includes code generated or substantially modified by an AI agent
- [ ] Includes tests generated or substantially modified by an AI agent

> NOTE: the `Generated-by` or `Assisted-by` trailers should be used in git commit messages when code or tests were generated or substantially modified by an AI agent, as described in the project's [`DEVELOPING.md`](https://github.com/vllm-project/guidellm/blob/main/DEVELOPING.md) file.





---

# git log

commit 3b89ec2
Author: SkiHatDuckie <SkiHatDuckie@gmail.com>
Date:   Thu Jun 25 11:54:24 2026 -0400

    Hotfix: Add relative_timestamp column to output in Mooncake deserializer
    
    Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 8d2cba0
Author: SkiHatDuckie <SkiHatDuckie@gmail.com>
Date:   Thu Jun 25 12:59:10 2026 -0400

    Move `timestamps` outside the loop
    
    Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

---------

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

@sjmonson sjmonson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, meant to post this morning. This PR also needs another rebase (hopefully the last one) and also some tests are failing.

Comment thread src/guidellm/data/deserializers/trace_common.py Outdated
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

@jaredoconnell jaredoconnell left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, and I tested it and it worked. I have a few comments.

Comment thread docs/guides/datasets.md
Comment thread src/guidellm/utils/trace_io.py Outdated
Comment thread src/guidellm/data/deserializers/trace_mooncake.py
Comment thread src/guidellm/data/deserializers/trace_common.py Outdated
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Comment thread src/guidellm/data/deserializers/trace_minimal.py Outdated
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
sjmonson
sjmonson previously approved these changes Jun 30, 2026

@sjmonson sjmonson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@SkiHatDuckie SkiHatDuckie marked this pull request as draft June 30, 2026 15:59

@dbutenhof dbutenhof left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a broken .md link after you renamed a header: that should be fixed. I don't think any of the other comments are pressing...

Comment thread docs/getting-started/benchmark.md Outdated
Comment thread docs/getting-started/benchmark.md Outdated
Comment thread src/guidellm/data/deserializers/trace_common.py Outdated
Comment thread src/guidellm/data/deserializers/trace_common.py
Comment thread src/guidellm/data/deserializers/trace_common.py Outdated
Comment thread src/guidellm/data/deserializers/trace_common.py Outdated

@jaredoconnell jaredoconnell left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready for merging once Dave's comments are addressed.

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
@SkiHatDuckie SkiHatDuckie marked this pull request as ready for review June 30, 2026 16:50
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
@mergify

mergify Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Queued — the merge queue status continues in this comment ↓.

@mergify

mergify Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Merge Queue Status

This pull request spent 5 minutes 25 seconds in the queue, including 4 minutes 58 seconds running CI.

Required conditions to merge
  • any of [🛡 GitHub repository ruleset rule Merge Requirements]:
    • check-success = @github-actions/quality (3.10) / type-checks
    • check-neutral = @github-actions/quality (3.10) / type-checks
    • check-skipped = @github-actions/quality (3.10) / type-checks
  • any of [🛡 GitHub repository ruleset rule Merge Requirements]:
    • check-success = @github-actions/quality (3.10) / precommit-checks
    • check-neutral = @github-actions/quality (3.10) / precommit-checks
    • check-skipped = @github-actions/quality (3.10) / precommit-checks
  • any of [🛡 GitHub repository ruleset rule Merge Requirements]:
    • check-success = @github-actions/quality (3.10) / quality-checks
    • check-neutral = @github-actions/quality (3.10) / quality-checks
    • check-skipped = @github-actions/quality (3.10) / quality-checks
  • any of [🛡 GitHub repository ruleset rule Merge Requirements]:
    • check-success = @github-actions/tests (3.10) / e2e-tests
    • check-neutral = @github-actions/tests (3.10) / e2e-tests
    • check-skipped = @github-actions/tests (3.10) / e2e-tests
  • any of [🛡 GitHub repository ruleset rule Merge Requirements]:
    • check-success = @github-actions/tests (3.10) / integration-tests
    • check-neutral = @github-actions/tests (3.10) / integration-tests
    • check-skipped = @github-actions/tests (3.10) / integration-tests
  • any of [🛡 GitHub repository ruleset rule Merge Requirements]:
    • check-success = @github-actions/tests (3.10) / unit-tests
    • check-neutral = @github-actions/tests (3.10) / unit-tests
    • check-skipped = @github-actions/tests (3.10) / unit-tests
  • any of [🛡 GitHub repository ruleset rule Merge Requirements]:
    • check-success = @github-actions/update-description
    • check-neutral = @github-actions/update-description
    • check-skipped = @github-actions/update-description

mergify Bot added a commit that referenced this pull request Jun 30, 2026
@mergify mergify Bot added the queued label Jun 30, 2026
@mergify mergify Bot merged commit d7c5123 into vllm-project:main Jun 30, 2026
12 checks passed
@mergify mergify Bot removed the queued label Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Mooncake Trace Data Support

4 participants