feat(hotswap): build, package, and sanity-check the hsa-hotswap tool#6094
feat(hotswap): build, package, and sanity-check the hsa-hotswap tool#6094lamb-j wants to merge 2 commits into
Conversation
✅ All Checks Passed — Ready for Review
📖 Need help? See the Policy FAQ for details on every check and how to fix failures. |
|
🎉 All checks passed! This PR is ready for review. |
840ddff to
5cdd40f
Compare
Integrate the relocated hotswap HSA tool. The HSA_TOOLS_LIB tool moved from comgr (libamd_comgr_hotswap_tool.so, removed by ROCm/llvm-project#3007) to rocm-systems projects/hotswap (libhsa-hotswap.so); comgr keeps only the amd_comgr_hotswap_rewrite API. - compiler/CMakeLists.txt: drop the removed HOTSWAP_BUILD_TOOL args; keep COMGR_ENABLE_HOTSWAP_TRANSPILE. - core/CMakeLists.txt + core/artifact-core-runtime.toml: declare the hsa-hotswap subproject (rocm-systems projects/hotswap; deps amd-comgr + ROCR-Runtime) and package libhsa-hotswap.so into the core-runtime artifact. - tests/test_rocm_sanity.py: add test_hotswap_tool_loads. When hotswap is enabled (libamd_comgr.so exports amd_comgr_hotswap_rewrite), libhsa-hotswap.so must be packaged and load cleanly under ROCr (rocminfo triggers hsa_init -> ROCr dlopens HSA_TOOLS_LIB tools). The allowlist is gfx1250->gfx1250 only, so the tool stays inert on other targets and rocminfo still succeeds. Skips when hotswap is disabled.
Temp-pin rocm-systems to a hotswap-only integration tip carrying ROCm/rocm-systems#7629 + #7715 (both merged to develop, not yet in TheRock's rocm-systems pin). Drop this commit once the rocm-systems SMP bump brings them into the pin.
5cdd40f to
d53a428
Compare
| ) | ||
| endif(THEROCK_BUILD_TESTING AND THEROCK_ENABLE_CORE_RUNTIME_TESTS) | ||
|
|
||
| if(THEROCK_BUILD_TESTING AND THEROCK_ENABLE_CORE_KFDTESTS) |
There was a problem hiding this comment.
What are the kfd tests and is the option wired?
davidd-amd
left a comment
There was a problem hiding this comment.
LGTM - just to be clear we are moving existing functionality not adding new functionality. It wasn't clear to me why we need to do this - i.e. what problem it solves but I am not as familiar with these tools.
|
Paused — superseded by an upstream design change. ROCm/rocm-systems#7921 ("feat(rocr): integrate HotSwap into ROCR loader") pivots the hotswap design: HotSwap becomes native in the ROCR loader (no That obsoletes this PR's approach:
What still holds (and is already on Holding this PR as draft pending #7921's direction. If #7921 lands, this PR will likely be closed (TheRock needs nothing further for the tool side); if #7921 stalls, the plugin approach here remains the fallback. |
ISSUE ID: #6096
Summary
TheRock-side integration of the relocated hotswap HSA tool. The hotswap
HSA_TOOLS_LIBtool has moved out of comgr (libamd_comgr_hotswap_tool.so, removed by ROCm/llvm-project#3007) into rocm-systemsprojects/hotswap(libhsa-hotswap.so); comgr now provides only theamd_comgr_hotswap_rewriteAPI. This PR wires TheRock to build, package, and sanity-check the relocated tool.What this PR does
HOTSWAP_BUILD_TOOLargs; keepCOMGR_ENABLE_HOTSWAP_TRANSPILE(comgr keeps the rewrite API).hsa-hotswapsubproject (rocm-systemsprojects/hotswap, depsamd-comgr+ROCR-Runtime) and packagelibhsa-hotswap.sointo thecore-runtimeartifact.test_hotswap_tool_loads— a minimal check that, when hotswap is enabled (libamd_comgr.soexportsamd_comgr_hotswap_rewrite),libhsa-hotswap.sois packaged and loads cleanly under ROCr.rocminfotriggershsa_init, which is when ROCrdlopensHSA_TOOLS_LIBtools; the allowlist isgfx1250 -> gfx1250only, so the tool stays inert on other targets androcminfomust still succeed. Skips when hotswap is disabled.Validation
HSA_TOOLS_LIBhookup — now reduced to the lightweightrocminfosanity check).Dependencies
amd-llvm: ✅ satisfied by main. The required ROCm/llvm-project#3007 (remove COMGR hotswap HSA tool) is already in TheRock's
compiler/amd-llvmpin (aa451e1f, landed via #6155). No amd-llvm change in this PR.rocm-systems→ temporary pin75469b9f(removed before merge)One remaining temp pin. Carries the merged hotswap work not yet in TheRock's rocm-systems pin:
libhsa-hotswap.so, linkhsa-runtime64, re-keyHSA_TOOLS_LIBtool name, ISA derivation + tests, OnUnload use-after-free fix, opt-inHSA_HOTSWAP_VERBOSElogginggfx1250 -> gfx1250Drop this temp pin once the rocm-systems SMP bump brings #7629 + #7715 into TheRock's pin.
Notes
hsa-hotswapsubproject resolves theamd_comgr+hsa-runtime64CONFIG packages from its deps;therock_test_validate_shared_libhard-fails the build iflibhsa-hotswap.soisn't produced, surfacing packaging misses.