Skip to content

feat(hotswap): build, package, and sanity-check the hsa-hotswap tool#6094

Draft
lamb-j wants to merge 2 commits into
mainfrom
users/lambj/hotswap-debug
Draft

feat(hotswap): build, package, and sanity-check the hsa-hotswap tool#6094
lamb-j wants to merge 2 commits into
mainfrom
users/lambj/hotswap-debug

Conversation

@lamb-j

@lamb-j lamb-j commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

ISSUE ID: #6096

Summary

TheRock-side integration of the relocated hotswap HSA tool. The hotswap HSA_TOOLS_LIB tool has moved out of comgr (libamd_comgr_hotswap_tool.so, removed by ROCm/llvm-project#3007) into rocm-systems projects/hotswap (libhsa-hotswap.so); comgr now provides only the amd_comgr_hotswap_rewrite API. This PR wires TheRock to build, package, and sanity-check the relocated tool.

What this PR does

  • compiler/CMakeLists.txt: drop the now-removed HOTSWAP_BUILD_TOOL args; keep COMGR_ENABLE_HOTSWAP_TRANSPILE (comgr keeps the rewrite API).
  • core/CMakeLists.txt + core/artifact-core-runtime.toml: declare the hsa-hotswap subproject (rocm-systems projects/hotswap, deps amd-comgr + ROCR-Runtime) and package libhsa-hotswap.so into the core-runtime artifact.
  • tests/test_rocm_sanity.py: add test_hotswap_tool_loads — a minimal check that, when hotswap is enabled (libamd_comgr.so exports amd_comgr_hotswap_rewrite), libhsa-hotswap.so is packaged and loads cleanly under ROCr. rocminfo triggers hsa_init, which is when ROCr dlopens HSA_TOOLS_LIB tools; the allowlist is gfx1250 -> gfx1250 only, so the tool stays inert on other targets and rocminfo must still succeed. Skips when hotswap is disabled.

Validation

  • ✅ Tool builds, packages, and loads cleanly under ROCr; inert on gfx942 with no regressions (originally verified via a full hip-tests HSA_TOOLS_LIB hookup — now reduced to the lightweight rocminfo sanity check).
  • ❌ Actual gfx1250 B0→A0 transpilation is out of scope — there is no gfx1250 runner in CI.

Dependencies

amd-llvm: ✅ satisfied by main. The required ROCm/llvm-project#3007 (remove COMGR hotswap HSA tool) is already in TheRock's compiler/amd-llvm pin (aa451e1f, landed via #6155). No amd-llvm change in this PR.

rocm-systems → temporary pin 75469b9f (removed before merge)

One remaining temp pin. Carries the merged hotswap work not yet in TheRock's rocm-systems pin:

PR Status Description
ROCm/rocm-systems#7629 merged to develop Build/install libhsa-hotswap.so, link hsa-runtime64, re-key HSA_TOOLS_LIB tool name, ISA derivation + tests, OnUnload use-after-free fix, opt-in HSA_HOTSWAP_VERBOSE logging
ROCm/rocm-systems#7715 merged to develop Restrict HotSwap forwarding to gfx1250 -> gfx1250

Drop this temp pin once the rocm-systems SMP bump brings #7629 + #7715 into TheRock's pin.

Notes

  • The hsa-hotswap subproject resolves the amd_comgr + hsa-runtime64 CONFIG packages from its deps; therock_test_validate_shared_lib hard-fails the build if libhsa-hotswap.so isn't produced, surfacing packaging misses.

@therock-pr-bot

therock-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

✅ All Checks Passed — Ready for Review

Check Status Details
🌿 Branch Name ✅ Pass
📝 PR Title/Description ✅ Pass
Forbidden Files ✅ Pass
🧪 Unit Test ✅ Pass PR does not contain code files — Unit Test auto-passed
🔎 pre-commit ✅ Pass
🚫 Draft PR 🔜 To Be Enabled
🚩 Feature Flag 🔜 To Be Enabled
📊 Code Coverage 🔜 To Be Enabled
🤖 therock-pr-bot ✅ Pass

🎉 All checks passed! This PR is ready for review.

📖 Need help? See the Policy FAQ for details on every check and how to fix failures.

@therock-pr-bot therock-pr-bot Bot added the Not ready to Review PR has unresolved policy failures — reviews blocked label Jun 24, 2026
@therock-pr-bot

therock-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

🎉 All checks passed! This PR is ready for review.

@lamb-j lamb-j changed the title [Testing Only] Hotswap runtime hookup: relocated hsa-hotswap tool + HSA_TOOLS_LIB hip-tests ci(hotswap): test runtime hookup of relocated hsa-hotswap tool Jun 24, 2026
@therock-pr-bot therock-pr-bot Bot removed the Not ready to Review PR has unresolved policy failures — reviews blocked label Jun 24, 2026
@lamb-j lamb-j force-pushed the users/lambj/hotswap-debug branch from 840ddff to 5cdd40f Compare June 29, 2026 17:45
@lamb-j lamb-j changed the title ci(hotswap): test runtime hookup of relocated hsa-hotswap tool feat(hotswap): build, package, and sanity-check the hsa-hotswap tool Jun 29, 2026
lamb-j added 2 commits June 29, 2026 10:54
Integrate the relocated hotswap HSA tool. The HSA_TOOLS_LIB tool moved from
comgr (libamd_comgr_hotswap_tool.so, removed by ROCm/llvm-project#3007) to
rocm-systems projects/hotswap (libhsa-hotswap.so); comgr keeps only the
amd_comgr_hotswap_rewrite API.

- compiler/CMakeLists.txt: drop the removed HOTSWAP_BUILD_TOOL args; keep
  COMGR_ENABLE_HOTSWAP_TRANSPILE.
- core/CMakeLists.txt + core/artifact-core-runtime.toml: declare the
  hsa-hotswap subproject (rocm-systems projects/hotswap; deps amd-comgr +
  ROCR-Runtime) and package libhsa-hotswap.so into the core-runtime artifact.
- tests/test_rocm_sanity.py: add test_hotswap_tool_loads. When hotswap is
  enabled (libamd_comgr.so exports amd_comgr_hotswap_rewrite), libhsa-hotswap.so
  must be packaged and load cleanly under ROCr (rocminfo triggers hsa_init ->
  ROCr dlopens HSA_TOOLS_LIB tools). The allowlist is gfx1250->gfx1250 only, so
  the tool stays inert on other targets and rocminfo still succeeds. Skips when
  hotswap is disabled.
Temp-pin rocm-systems to a hotswap-only integration tip carrying
ROCm/rocm-systems#7629 + #7715 (both merged to develop, not yet in TheRock's
rocm-systems pin). Drop this commit once the rocm-systems SMP bump brings them
into the pin.
@lamb-j lamb-j force-pushed the users/lambj/hotswap-debug branch from 5cdd40f to d53a428 Compare June 29, 2026 17:55
Comment thread core/CMakeLists.txt
)
endif(THEROCK_BUILD_TESTING AND THEROCK_ENABLE_CORE_RUNTIME_TESTS)

if(THEROCK_BUILD_TESTING AND THEROCK_ENABLE_CORE_KFDTESTS)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the kfd tests and is the option wired?

@davidd-amd davidd-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just to be clear we are moving existing functionality not adding new functionality. It wasn't clear to me why we need to do this - i.e. what problem it solves but I am not as familiar with these tools.

@lamb-j lamb-j marked this pull request as draft June 29, 2026 20:42
@lamb-j

lamb-j commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Paused — superseded by an upstream design change.

ROCm/rocm-systems#7921 ("feat(rocr): integrate HotSwap into ROCR loader") pivots the hotswap design: HotSwap becomes native in the ROCR loader (no HSA_TOOLS_LIB), the projects/hotswap plugin and libhsa-hotswap.so are removed, and ROCR lazy-dlopens libamd_comgr.so directly to call amd_comgr_hotswap_rewrite.

That obsoletes this PR's approach:

  • The hsa-hotswap subproject + libhsa-hotswap.so packaging + artifact-descriptor capture — no longer applicable (projects/hotswap is deleted upstream).
  • The HSA_TOOLS_LIB / rocminfo sanity test — no longer the mechanism.

What still holds (and is already on main): libamd_comgr.so built with the rewrite API (COMGR_ENABLE_HOTSWAP_TRANSPILE, via the #3007 amd-llvm bump #6155) — which is exactly what ROCR dlopens under #7921.

Holding this PR as draft pending #7921's direction. If #7921 lands, this PR will likely be closed (TheRock needs nothing further for the tool side); if #7921 stalls, the plugin approach here remains the fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

2 participants