-
Notifications
You must be signed in to change notification settings - Fork 176
Trace File Refactor #829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+1,067
−1,139
Merged
Trace File Refactor #829
Changes from 30 commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
027f439
Move dataset validation to load_trace_rows
SkiHatDuckie 6ec92f5
Rename `TraceColumn` in test file to `TraceColumnGenerator`
SkiHatDuckie 76f43df
Add relative_timestamp column to deserialized dataset
SkiHatDuckie d79be07
Switch to streaming datasets for synthetic trace
SkiHatDuckie 00ccea7
Update docs
SkiHatDuckie 7de32d6
Add trace_common.py + classes
SkiHatDuckie 6f6e464
Repair broken test files
SkiHatDuckie 9cd8729
Instantiate/Validate/Dispatch formats through TraceFormatArgs
SkiHatDuckie 26b740f
Rework format handling; flatten data args for CLI
SkiHatDuckie 7911a87
Repair tests
SkiHatDuckie 9aea389
Remove TraceDataset from __all__
SkiHatDuckie 27687b2
Move common funcs to trace_common
SkiHatDuckie 73b2eda
Add test_trace_common.py and rearrange tests
SkiHatDuckie 2cdb3b8
Refactor test_trace_synthetic
SkiHatDuckie 04626fe
Rename trace_synthetic to trace_minimal
SkiHatDuckie 8f1ab50
Improve text coverage
SkiHatDuckie dff58aa
Update inline docs
SkiHatDuckie 342bd0c
Update docs
SkiHatDuckie 37f3b10
Cleanup linting & docs
SkiHatDuckie 7bc2f96
Spread `kind`ness
SkiHatDuckie e0fe688
Update docs
SkiHatDuckie 955a8f5
Fix: Register formats with deserializer
SkiHatDuckie 8908132
Satisfy linting
SkiHatDuckie e344e80
Move `timestamps` outside the loop
SkiHatDuckie 1e842a4
Register formats w/ deserializer outside trace_common
SkiHatDuckie f68217f
Update TraceDataArgs
SkiHatDuckie 73617d1
Move trace_io contents to trace_common
SkiHatDuckie d6f0e67
Fix typo
SkiHatDuckie 9dd48e4
Remove TraceColumn
SkiHatDuckie f32a01e
Specify bad path reason
SkiHatDuckie 5c51338
Add comment to create_prompt
SkiHatDuckie 9edb490
Re-register trace_minimal as trace_synthetic
SkiHatDuckie 9a2c7c1
Support more filetypes + update docs
SkiHatDuckie 8346b53
Rename trace_file_formats.md to trace_replay.md
SkiHatDuckie 2741e11
Make margin_of_safety an optional parameter
SkiHatDuckie 5b955d1
Update exception msgs
SkiHatDuckie 00efe7b
Update exception msgs x2
SkiHatDuckie File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| # Trace File Formats | ||
|
|
||
| Many trace files are formatted in ways that need to be specially handled to create an accurate replay. This guide covers all trace file formats currently supported by GuideLLM, along with the format-agnostic and format-specific data arguments. | ||
|
|
||
| Detailed use of the replay profile and file-based datasets as a whole is explained in [Trace Replay Benchmarking](../getting-started/benchmark.md#trace-replay-benchmarking). | ||
|
|
||
| ## Supported Formats | ||
|
|
||
| These are passed to the `--data` argument as `kind=format`: | ||
|
|
||
| - `trace_minimal`: A trace format that does the bare minimum needed to complete a fully functioning trace replay benchmark with synthetic prompt generation | ||
| - `mooncake`: The trace format used by the serving platform Mooncake, as defined in [https://doi.org/10.48550/arXiv.2407.00079](https://doi.org/10.48550/arXiv.2407.00079) | ||
|
|
||
| ## Format-Agnostic Data Arguments | ||
|
|
||
| All trace formats can accept the following optional data arguments: | ||
|
|
||
| | Argument | Default | Description | | ||
| | ---------------------- | --------------- | ----------------------------------------------------- | | ||
| | `timestamp_column` | "timestamp" | Column name for timestamps in the trace file | | ||
| | `prompt_tokens_column` | "input_length" | Column name for prompt token counts in the trace file | | ||
| | `output_tokens_column` | "output_length" | Column name for output token counts in the trace file | | ||
|
|
||
| These are passed through the `--data` argument like below: | ||
|
|
||
| ```bash | ||
| guidellm benchmark \ | ||
| --target http://localhost:8000 \ | ||
| --profile kind=replay \ | ||
| --data "kind=trace_minimal,path=replay.jsonl,timestamp_column=ts,prompt_tokens_column=input_tokens,output_tokens_column=generated_tokens" | ||
| ``` | ||
|
|
||
| `trace_minimal` can be thought of as the format-agnostic option, only looking for the timestamp, prompt token count and output token count columns and ignoring all other features contained in a dataset. While primarily used for testing, `trace_minimal` may be used as a fallback for trace formats not currently supported by GuideLLM. | ||
|
|
||
| ## Format-Specific Data Arguments | ||
|
|
||
| ### `mooncake` | ||
|
|
||
| The Mooncake format expects an additional column for hash IDs. During prompt generation, hash IDs sharing the same previous ID are required to represent dinstinct blocks of token ids. | ||
|
|
||
| | Argument | Default | Description | | ||
| | -------------------- | ---------- | --------------------------------------------------- | | ||
| | `hash_ids_column` | "hash_ids" | Column name for lists of hash IDs in the trace file | | ||
| | `hash_id_block_size` | 512 | Amount of tokens represented by one hash ID | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.