diff --git a/docs/source/getting_started/televiz.rst b/docs/source/getting_started/televiz.rst index 95080b014..99fbe4477 100644 --- a/docs/source/getting_started/televiz.rst +++ b/docs/source/getting_started/televiz.rst @@ -5,8 +5,9 @@ Televiz ======= Televiz (``isaacteleop.viz``) is a lightweight compositor for Isaac Teleop. It composites camera and -sensor feeds — with 3D rendered content coming soon — into an XR headset, a desktop window, or an -offscreen buffer, integrating directly with the device-tracking and retargeting pipeline. +sensor feeds, plus 3D rendered content (gsplat, nvblox, neural reconstruction), into an XR headset, a +desktop window, or an offscreen buffer, integrating directly with the device-tracking and retargeting +pipeline. It is a **compositor**, not a capture or streaming layer: it consumes GPU frames and assembles them into a final image. Camera capture, decode, and network transport live in the application (see @@ -29,16 +30,17 @@ which owns the Vulkan context, the display target, the OpenXR session (in XR mod of **layers**. Content producers submit GPU buffers to layers; the session composites every layer into one frame each time you call ``render()``. -The built-in layer type today is -:code-file:`QuadLayer ` — a CUDA-fed 2D texture -plane (mono or stereo), optionally placed in 3D space. Use it for camera feeds. +Two layer types are available: -.. note:: +* :code-file:`QuadLayer ` — a CUDA-fed 2D texture + plane (mono or stereo), optionally placed in 3D space. Use it for camera feeds. +* :code-file:`ProjectionLayer ` — a full-view + RGBD layer for external renderers (gsplat, nvblox, neural reconstruction) that produce per-view + ``(color, depth)`` buffers. Use it to present a rendered 3D scene from the current head pose. - **Coming soon:** ``ProjectionLayer``, a full-view stereo RGBD layer for external renderers - (gsplat, nvblox, neural reconstruction) that produce per-view ``(color, depth)`` buffers, - Z-composited with quads. It is not yet available in this release — see `ProjectionLayer - (coming soon)`_ below. +A session holds **either** one ``ProjectionLayer`` **or** any number of ``QuadLayer`` s, not both: +quads composite into a shared render target, while a projection layer is presented directly (see +`ProjectionLayer`_). All symbols are imported from the top-level module:: @@ -212,19 +214,63 @@ For a stereo layer both buffers are copied on the same stream and signaled toget never sees a half-matched pair. Lock-mode placement strategies (``world`` / ``head`` / ``lazy``) are **application policy** and ship in the sample, not in the module. -ProjectionLayer (coming soon) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +ProjectionLayer +^^^^^^^^^^^^^^^ -.. note:: +A full-view RGBD layer for **in-loop** renderers — gsplat, nvblox, or neural reconstruction engines +that produce per-view ``(color, depth)`` buffers. Configure it with ``ProjectionLayerConfig``: - ``ProjectionLayer`` is under active development and **not yet available in this release**. The - description below is a preview of the planned API and may change. +.. list-table:: + :header-rows: 1 + :widths: 25 75 + + * - Field + - Description + * - ``name`` + - Layer name. + * - ``view_resolution`` + - Per-view render resolution. **Must equal** ``session.get_recommended_resolution()`` — the + layer's images are copied 1:1 into the presentation swapchains (per-eye in XR). A mismatch + is rejected by ``add_projection_layer``. + * - ``color_format`` + - ``PixelFormat.kRGBA8``. + * - ``depth_format`` + - ``PixelFormat.kD32F`` (default) so the depth reaches the XR runtime for positional + reprojection, or ``None`` to present color only. + * - ``stereo`` + - ``True`` for per-eye buffers. A stereo (XR) display **requires** a stereo layer; a mono layer + is rejected at ``add_projection_layer``. + +Unlike ``QuadLayer``, a projection layer is **direct-present**: each view's ``(color, depth)`` is +copied straight into the presentation swapchains (no shared render target). Because of that a session +holds *either* one ``ProjectionLayer`` *or* any number of ``QuadLayer`` s, never both. + +The renderer runs **in-loop** with the frame loop: read the predicted view poses from the +``FrameInfo`` returned by ``begin_frame()``, render against them, then ``submit()`` before +``end_frame()``: + +.. code-block:: python + + cfg = televiz.ProjectionLayerConfig() + cfg.view_resolution = session.get_recommended_resolution() + cfg.stereo = session.is_xr_mode() + layer = session.add_projection_layer(cfg) + + while running: + info = session.begin_frame() + if info.should_render: + # Render against THIS frame's per-eye poses (info.views[i].pose + .fov). + color, depth = renderer.render(info.views) # RGBA8 + D32F CUDA buffers + if layer.stereo: + layer.submit(left_color, left_depth, right_color, right_depth, stream=cuda_stream) + else: + layer.submit(color, depth, stream=cuda_stream) + session.end_frame() -A planned full-view RGBD layer for in-loop renderers — gsplat, nvblox, or neural reconstruction -engines that produce per-view ``(color, depth)`` buffers. Unlike ``QuadLayer``, the renderer will -run **in-loop** with the XR frame loop: render against the predicted view poses from the current -frame, then submit between ``begin_frame()`` and ``end_frame()``. Output is composited with depth, -so it Z-combines with quad layers. +If the renderer is slower than display rate, the runtime / CloudXR paces the app via ``xrWaitFrame`` +and reprojects the last submitted frame at display rate. In XR, a visible layer that does **not** +submit for a frame presents nothing (the swapchains are cleared) rather than reproject stale RGBD +under a new pose. Frame loop ---------- diff --git a/docs/source/overview/architecture.rst b/docs/source/overview/architecture.rst index 8efcb5f54..ed78526a8 100644 --- a/docs/source/overview/architecture.rst +++ b/docs/source/overview/architecture.rst @@ -43,7 +43,7 @@ Visualization (Televiz) Televiz (``isaacteleop.viz``) is a lightweight compositor module for visualizing what the operator sees — camera and sensor feeds, plus 3D rendered content — in an XR headset or a desktop window. -- Composites multiple sources into one view: 2D camera/sensor planes (``QuadLayer``) today, with full-view stereo RGBD (``ProjectionLayer``) for 3D rendered content coming soon +- Composites multiple sources into one view: 2D camera/sensor planes (``QuadLayer``) and full-view stereo RGBD (``ProjectionLayer``) for 3D rendered content - Per-eye stereo rendering and 3D placement in XR; the same API drives windowed and offscreen output - Zero-copy submission of GPU frames straight from CuPy, PyTorch, or any CUDA memory object - Shares one OpenXR session with the teleop device trackers, so rendering and tracking can run over a single CloudXR connection diff --git a/src/viz/core/cpp/device_image.cpp b/src/viz/core/cpp/device_image.cpp index 0fbabc2c3..e64a09583 100644 --- a/src/viz/core/cpp/device_image.cpp +++ b/src/viz/core/cpp/device_image.cpp @@ -84,7 +84,14 @@ VkFormat to_vk_storage_format(PixelFormat format) case PixelFormat::kRGBA8: return VK_FORMAT_R8G8B8A8_UNORM; case PixelFormat::kD32F: - return VK_FORMAT_D32_SFLOAT; + // Single-channel float COLOR format, NOT VK_FORMAT_D32_SFLOAT. Depth + // formats use hardware depth compression in optimal tiling that CUDA + // external-memory array interop cannot interpret, so a CUDA-written + // D32_SFLOAT image reads back as garbage on the Vulkan side. R32_SFLOAT + // is bit-identical (IEEE float32) and interops exactly like the color + // images do; the bridge into the D32_SFLOAT XR depth swapchain happens + // via a staging buffer in the backend (float bits copy verbatim). + return VK_FORMAT_R32_SFLOAT; } throw std::runtime_error("DeviceImage: unsupported PixelFormat"); } @@ -96,7 +103,7 @@ VkFormat to_vk_view_format(PixelFormat format) case PixelFormat::kRGBA8: return VK_FORMAT_R8G8B8A8_SRGB; case PixelFormat::kD32F: - return VK_FORMAT_D32_SFLOAT; + return VK_FORMAT_R32_SFLOAT; // see to_vk_storage_format } throw std::runtime_error("DeviceImage: unsupported PixelFormat"); } @@ -128,13 +135,17 @@ std::unique_ptr DeviceImage::create(const VkContext& ctx, { throw std::invalid_argument("DeviceImage: resolution must be non-zero"); } - if (format != PixelFormat::kRGBA8) + if (format != PixelFormat::kRGBA8 && format != PixelFormat::kD32F) { - // kD32F is reserved for ProjectionLayer's depth path. The - // CUDA-Vulkan interop contract for a depth image (sample - // semantics, layout transitions, color-space view) is not - // worked out yet, so refuse to half-build it. - throw std::invalid_argument("DeviceImage: only PixelFormat::kRGBA8 is supported"); + throw std::invalid_argument("DeviceImage: unsupported PixelFormat"); + } + if (format == PixelFormat::kD32F && mip_levels > 1) + { + // Depth + mip chain is meaningless (filtering depth between mip + // levels produces incorrect occlusion) and we'd have to + // special-case the blit-down pipeline. Reject explicitly rather + // than silently allocating the chain. + throw std::invalid_argument("DeviceImage: kD32F does not support mip_levels > 1"); } // mip_levels == 0 -> auto-compute full chain to 1x1. if (mip_levels == 0) @@ -335,8 +346,9 @@ void DeviceImage::create_vk_image_view() info.image = image_; info.viewType = VK_IMAGE_VIEW_TYPE_2D; info.format = vk_format_; - info.subresourceRange.aspectMask = - (format_ == PixelFormat::kD32F) ? VK_IMAGE_ASPECT_DEPTH_BIT : VK_IMAGE_ASPECT_COLOR_BIT; + // Always COLOR: kD32F is stored as R32_SFLOAT (a color format), not a + // depth format — see to_vk_storage_format. + info.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; info.subresourceRange.baseMipLevel = 0; info.subresourceRange.levelCount = mip_levels_; info.subresourceRange.baseArrayLayer = 0; @@ -517,8 +529,8 @@ void DeviceImage::run_one_shot_layout_transition(VkImageLayout old_layout, barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; barrier.image = image_; - barrier.subresourceRange.aspectMask = - (format_ == PixelFormat::kD32F) ? VK_IMAGE_ASPECT_DEPTH_BIT : VK_IMAGE_ASPECT_COLOR_BIT; + // kD32F is stored as R32_SFLOAT (color format), so always COLOR aspect. + barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; barrier.subresourceRange.baseMipLevel = 0; barrier.subresourceRange.levelCount = mip_levels_; barrier.subresourceRange.baseArrayLayer = 0; diff --git a/src/viz/layers/cpp/CMakeLists.txt b/src/viz/layers/cpp/CMakeLists.txt index 02c92d88f..e48096162 100644 --- a/src/viz/layers/cpp/CMakeLists.txt +++ b/src/viz/layers/cpp/CMakeLists.txt @@ -10,7 +10,9 @@ cmake_minimum_required(VERSION 3.20) # viz/layers_tests/. add_library(viz_layers STATIC quad_layer.cpp + projection_layer.cpp inc/viz/layers/quad_layer.hpp + inc/viz/layers/projection_layer.hpp ) target_include_directories(viz_layers diff --git a/src/viz/layers/cpp/inc/viz/layers/projection_layer.hpp b/src/viz/layers/cpp/inc/viz/layers/projection_layer.hpp new file mode 100644 index 000000000..35f217f90 --- /dev/null +++ b/src/viz/layers/cpp/inc/viz/layers/projection_layer.hpp @@ -0,0 +1,215 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +#pragma once + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +namespace viz +{ + +class VkContext; + +// ProjectionLayer: a full-view RGBD layer for renderers (gsplat, nvblox, +// neural reconstruction) that produce (color, depth) buffers per frame. +// +// DIRECT-PRESENT ONLY. Unlike QuadLayer, this layer is never composited +// into the shared render target. Its per-eye (color, depth) images are +// copied STRAIGHT into the presentation swapchains by the backend +// (vkCmdCopyImage, verbatim) — exactly like holohub xr_gsplat. In kXr +// that means the renderer's depth lands in the XR depth swapchain with no +// gl_FragDepth round-trip, so CloudXR positional reprojection gets exact +// depth. Consequently a VizSession holds EITHER one ProjectionLayer OR +// any number of QuadLayers, never both (enforced by VizSession). Because +// the copy is 1:1, ``view_resolution`` MUST equal the swapchain per-view +// size (use VizSession::get_recommended_resolution()). +// +// Frame loop contract — IMPORTANT: +// +// info = session.begin_frame() // xrLocateViews +// color, depth = renderer.render(info.views) // render against THIS frame's views +// layer.submit(color, depth) // publish for THIS frame +// session.end_frame() // copy to swapchain + xrEndFrame +// +// ``submit()`` MUST be called between ``begin_frame()`` and +// ``end_frame()``. The renderer MUST render against ``info.views[i].pose`` +// (the predicted-display-time pose for this frame). The runtime / CloudXR +// paces the app via xrWaitFrame and reprojects the last submitted frame +// at display rate if the renderer is slower. +// +// In ``kXr``, a visible ProjectionLayer that does NOT receive a +// ``submit()`` for the current frame presents nothing (the backend clears +// the swapchains) rather than hand the runtime yesterday's RGBD under +// today's pose. In ``kWindow`` / ``kOffscreen`` the freshness gate is off +// — the most recent publish stays on screen until replaced. +// +// Mailbox: kSlotCount per-eye (color, depth) DeviceImage pairs. submit() +// picks a slot that's neither ``latest_`` nor in any ``in_use_`` entry, +// memcpys + signals cuda_done_writing on the caller's stream, blocks on +// cudaStreamSynchronize so the caller can re-use source buffers +// immediately, then atomically promotes the slot to ``latest_``. +// acquire_direct_views() promotes ``latest_`` to ``in_use_[slot]`` and +// returns that slot's images for the backend copy. +// +// Stereo: when Config::stereo is true, the layer allocates paired +// (left, right) storage per slot. submit() must ship both eyes on a +// single CUDA stream; stream ordering keeps the pair atomic. In kXr +// view 0 (left eye) → left buffer, view 1 (right eye) → right. +// +// Memory (per layer): +// mono 1024² RGBA8+D32F: 7 slots × 1024² × 8 B ≈ 56 MB +// stereo 1024² RGBA8+D32F: ≈ 112 MB +// stereo 2048² RGBA8+D32F: ≈ 448 MB +class ProjectionLayer : public LayerBase +{ +public: + // Sized to cover backend image counts up to 5, leave one free slot. + static constexpr uint32_t kMaxFramesInFlight = 5; + static constexpr uint32_t kSlotCount = kMaxFramesInFlight + 2; + + struct Config + { + std::string name = "ProjectionLayer"; + Resolution view_resolution{}; + PixelFormat color_format = PixelFormat::kRGBA8; + + // nullopt → no depth buffer allocated; ProjectionLayer always + // writes gl_FragDepth = 1.0 (far). Without depth, this layer + // loses Z-compositing with QuadLayer. Useful for renderers that + // genuinely have no depth (sky / background fills). + std::optional depth_format = PixelFormat::kD32F; + + // true → per-eye paired storage. submit MUST ship both eyes. + // In kWindow / kOffscreen the LEFT buffer is sampled; in kXr + // view 0 → LEFT, view 1 → RIGHT. + bool stereo = false; + }; + + ProjectionLayer(const VkContext& ctx, Config config); + ~ProjectionLayer() override; + void destroy(); + + ProjectionLayer(const ProjectionLayer&) = delete; + ProjectionLayer& operator=(const ProjectionLayer&) = delete; + + // Publish a frame. Each buffer is a CUDA-linear VizBuffer (kDevice + // space) matching the layer's resolution and the relevant format + // (color → color_format, depth → kD32F). Validated against the + // config; mismatch throws std::invalid_argument. + // + // Mono no-depth: submit(color) + // Mono with depth: submit(color, &depth) + // Stereo no-depth: submit(left_color, nullptr, &right_color, nullptr) + // Stereo with depth: submit(left_color, &left_depth, &right_color, &right_depth) + // + // submit() does one cudaMemcpy2DToArrayAsync per provided buffer + // on ``stream``, signals cuda_done_writing on the same stream, then + // BLOCKS on cudaStreamSynchronize so the caller can re-use source + // buffers immediately. Cost: ~0.5 ms / 1024² color + depth on a + // discrete GPU. + // + // Marks the layer "fresh for this frame" so record() will draw it. + // VizSession::begin_frame clears the flag at the start of each + // frame; a renderer that doesn't submit will see its content + // skipped in kXr. + // + // GIL: pybind binding releases the GIL across this whole call. + void submit(const VizBuffer& left_color, + const VizBuffer* left_depth = nullptr, + const VizBuffer* right_color = nullptr, + const VizBuffer* right_depth = nullptr, + cudaStream_t stream = 0); + + // LayerBase contract. + void on_frame_begin() override; // clears submitted_this_frame_ flag + + // Direct-present-only: never drawn into the shared render pass, so + // record() is a no-op. The compositor always takes the direct path + // (acquire_direct_views) for this layer. + void record(VkCommandBuffer /*cmd*/, + const std::vector& /*views*/, + const RenderTarget& /*target*/, + uint32_t /*in_flight_slot*/) override + { + } + + // cuda_done_writing waits (TRANSFER stage — the backend copies these + // images) for color + depth of every active view in the in-use slot. + // kSlotNone → empty vector. + std::vector get_wait_semaphores() const override; + + bool is_projection_layer() const noexcept override + { + return true; + } + bool supports_direct_present() const noexcept override + { + return true; + } + const VkContext* vk_context() const noexcept override + { + return ctx_; + } + std::vector acquire_direct_views(uint32_t in_flight_slot) override; + + // Direct-present requires a 1:1 swapchain copy, so view_resolution must + // equal the backend's per-view size, a stereo display needs a stereo + // (>= view-count) layer, and the backend's in-flight image count must fit + // the mailbox. Throws std::invalid_argument on any mismatch. + void validate_backend_compatibility(Resolution recommended_view_resolution, + uint32_t backend_view_count, + uint32_t backend_image_count) const override; + + // Accessors. + Resolution view_resolution() const noexcept; + PixelFormat color_format() const noexcept; + std::optional depth_format() const noexcept; + bool is_stereo() const noexcept; + uint32_t view_count() const noexcept; + + // Diagnostic — nullptr outside valid ranges. + const DeviceImage* color_image(uint32_t slot, uint32_t view) const noexcept; + const DeviceImage* depth_image(uint32_t slot, uint32_t view) const noexcept; + +private: + static constexpr uint8_t kSlotNone = 0xFF; + + void init(); + uint8_t pick_free_slot() const noexcept; + void validate_submit_buffer(const VizBuffer& buf, PixelFormat expected_format, const char* label) const; + void enqueue_copy(const VizBuffer& src, DeviceImage& dst, cudaStream_t stream) const; + + const VkContext* ctx_ = nullptr; + Config config_; + uint32_t view_count_ = 1; + bool has_depth_ = true; + + // Per-eye (color, depth) mailbox storage. CUDA-mapped DeviceImages the + // backend copies straight to the swapchains. + std::array>, kSlotCount> slots_color_; + std::array>, kSlotCount> slots_depth_; + + // Mailbox. + std::atomic latest_{ kSlotNone }; + std::array, kMaxFramesInFlight> in_use_{}; + std::atomic last_in_use_slot_{ kSlotNone }; + + // Set by submit(), cleared by on_frame_begin(). acquire_direct_views() + // consults this in kXr to gate stale-RGBD-under-new-pose presents. + std::atomic submitted_this_frame_{ false }; +}; + +} // namespace viz diff --git a/src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp b/src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp index 41254162f..8d51668ef 100644 --- a/src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp +++ b/src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp @@ -192,6 +192,11 @@ class QuadLayer : public LayerBase // Drives aspect-fit letterbox in window mode; ignored in kXr. std::optional aspect_ratio() const noexcept override; + const VkContext* vk_context() const noexcept override + { + return ctx_; + } + Resolution resolution() const noexcept; PixelFormat format() const noexcept; diff --git a/src/viz/layers/cpp/projection_layer.cpp b/src/viz/layers/cpp/projection_layer.cpp new file mode 100644 index 000000000..a1cc52407 --- /dev/null +++ b/src/viz/layers/cpp/projection_layer.cpp @@ -0,0 +1,437 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +#include +#include +#include + +#include +#include +#include + +namespace viz +{ + +namespace +{ + +void check_cuda(cudaError_t result, const char* what) +{ + if (result != cudaSuccess) + { + throw std::runtime_error(std::string("ProjectionLayer: ") + what + " failed: " + cudaGetErrorString(result)); + } +} + +} // namespace + +ProjectionLayer::ProjectionLayer(const VkContext& ctx, Config config) + : LayerBase(config.name), ctx_(&ctx), config_(std::move(config)) +{ + // Validate config first (cheap, no resources), then the context. + // Config checks don't depend on ctx, and ordering them first lets them + // be unit-tested without a GPU-initialized VkContext. + if (config_.view_resolution.width == 0 || config_.view_resolution.height == 0) + { + throw std::invalid_argument("ProjectionLayer: view_resolution must be non-zero"); + } + if (config_.color_format != PixelFormat::kRGBA8) + { + throw std::invalid_argument("ProjectionLayer: color_format must be kRGBA8"); + } + if (config_.depth_format.has_value() && config_.depth_format.value() != PixelFormat::kD32F) + { + throw std::invalid_argument("ProjectionLayer: depth_format must be kD32F or nullopt"); + } + if (!ctx.is_initialized()) + { + throw std::invalid_argument("ProjectionLayer: VkContext not initialized"); + } + view_count_ = config_.stereo ? 2u : 1u; + has_depth_ = config_.depth_format.has_value(); + for (auto& slot : in_use_) + { + slot.store(kSlotNone, std::memory_order_relaxed); + } + init(); +} + +ProjectionLayer::~ProjectionLayer() +{ + destroy(); +} + +void ProjectionLayer::init() +{ + // The only resources this layer owns are the per-eye (color, depth) + // mailbox DeviceImages. There is no render pipeline — the backend + // copies these images straight to the swapchains. + try + { + for (uint32_t s = 0; s < kSlotCount; ++s) + { + slots_color_[s].reserve(view_count_); + for (uint32_t v = 0; v < view_count_; ++v) + { + slots_color_[s].push_back(DeviceImage::create(*ctx_, config_.view_resolution, config_.color_format, 1)); + } + if (has_depth_) + { + slots_depth_[s].reserve(view_count_); + for (uint32_t v = 0; v < view_count_; ++v) + { + slots_depth_[s].push_back( + DeviceImage::create(*ctx_, config_.view_resolution, *config_.depth_format, 1)); + } + } + } + } + catch (...) + { + destroy(); + throw; + } +} + +void ProjectionLayer::destroy() +{ + // Drain pending GPU work before freeing the images the compositor's + // command buffers reference. + if (ctx_ != nullptr && ctx_->device() != VK_NULL_HANDLE) + { + (void)vkDeviceWaitIdle(ctx_->device()); + } + for (uint32_t s = 0; s < kSlotCount; ++s) + { + slots_color_[s].clear(); + slots_depth_[s].clear(); + } +} + +// ─── Submit ────────────────────────────────────────────────────────── + +void ProjectionLayer::validate_submit_buffer(const VizBuffer& buf, PixelFormat expected_format, const char* label) const +{ + if (buf.data == nullptr) + { + throw std::invalid_argument(std::string("ProjectionLayer: ") + label + ": data is null"); + } + if (buf.space != MemorySpace::kDevice) + { + throw std::invalid_argument(std::string("ProjectionLayer: ") + label + ": MemorySpace must be kDevice"); + } + if (buf.format != expected_format) + { + throw std::invalid_argument(std::string("ProjectionLayer: ") + label + ": pixel format mismatch"); + } + if (buf.width != config_.view_resolution.width || buf.height != config_.view_resolution.height) + { + throw std::invalid_argument(std::string("ProjectionLayer: ") + label + ": resolution mismatch"); + } +} + +void ProjectionLayer::enqueue_copy(const VizBuffer& src, DeviceImage& dst, cudaStream_t stream) const +{ + const size_t row_bytes = static_cast(src.width) * bytes_per_pixel(src.format); + const size_t src_pitch = (src.pitch != 0) ? src.pitch : row_bytes; + check_cuda(cudaMemcpy2DToArrayAsync(dst.cuda_array(), + /*wOffset=*/0, + /*hOffset=*/0, src.data, src_pitch, row_bytes, src.height, + cudaMemcpyDeviceToDevice, stream), + "cudaMemcpy2DToArrayAsync"); +} + +uint8_t ProjectionLayer::pick_free_slot() const noexcept +{ + const uint8_t latest = latest_.load(std::memory_order_acquire); + for (uint8_t s = 0; s < static_cast(kSlotCount); ++s) + { + if (s == latest) + { + continue; + } + bool used = false; + for (const auto& a : in_use_) + { + if (a.load(std::memory_order_acquire) == s) + { + used = true; + break; + } + } + if (!used) + { + return s; + } + } + return kSlotNone; +} + +void ProjectionLayer::submit(const VizBuffer& left_color, + const VizBuffer* left_depth, + const VizBuffer* right_color, + const VizBuffer* right_depth, + cudaStream_t stream) +{ + // ── Validate config / call shape ───────────────────────────────── + validate_submit_buffer(left_color, config_.color_format, "submit(left_color)"); + + const bool stereo = config_.stereo; + if (stereo) + { + if (right_color == nullptr) + { + throw std::invalid_argument("ProjectionLayer: stereo layer requires right_color"); + } + validate_submit_buffer(*right_color, config_.color_format, "submit(right_color)"); + } + else + { + if (right_color != nullptr || right_depth != nullptr) + { + throw std::invalid_argument("ProjectionLayer: mono layer must not pass right buffers"); + } + } + + if (has_depth_) + { + if (left_depth == nullptr) + { + throw std::invalid_argument("ProjectionLayer: depth-enabled layer requires left_depth"); + } + validate_submit_buffer(*left_depth, PixelFormat::kD32F, "submit(left_depth)"); + if (stereo) + { + if (right_depth == nullptr) + { + throw std::invalid_argument("ProjectionLayer: stereo + depth requires right_depth"); + } + validate_submit_buffer(*right_depth, PixelFormat::kD32F, "submit(right_depth)"); + } + } + else + { + if (left_depth != nullptr || right_depth != nullptr) + { + throw std::invalid_argument("ProjectionLayer: depth-disabled layer must not pass depth buffers"); + } + } + + // ── Pick a free slot ───────────────────────────────────────────── + const uint8_t slot = pick_free_slot(); + if (slot == kSlotNone) + { + // Should be unreachable given the kSlotCount invariant + // (kMaxFramesInFlight + 2 ≥ worst-case forbidden set + 1). + throw std::runtime_error("ProjectionLayer: no free mailbox slot — sizing invariant violated"); + } + + // ── Copy + signal ──────────────────────────────────────────────── + enqueue_copy(left_color, *slots_color_[slot][0], stream); + if (has_depth_) + { + enqueue_copy(*left_depth, *slots_depth_[slot][0], stream); + } + if (stereo) + { + enqueue_copy(*right_color, *slots_color_[slot][1], stream); + if (has_depth_) + { + enqueue_copy(*right_depth, *slots_depth_[slot][1], stream); + } + } + + // One semaphore signal per CUDA-mapped image we wrote. The compositor + // waits on the in-use slot's set of cuda_done_writing values before + // the backend copies them (get_wait_semaphores). + slots_color_[slot][0]->cuda_signal_write_done(stream); + if (has_depth_) + { + slots_depth_[slot][0]->cuda_signal_write_done(stream); + } + if (stereo) + { + slots_color_[slot][1]->cuda_signal_write_done(stream); + if (has_depth_) + { + slots_depth_[slot][1]->cuda_signal_write_done(stream); + } + } + + // BLOCK on stream completion so the caller can re-use src buffers + // immediately. Same contract as QuadLayer::submit. ~sub-ms cost. + check_cuda(cudaStreamSynchronize(stream), "cudaStreamSynchronize"); + + latest_.store(slot, std::memory_order_release); + submitted_this_frame_.store(true, std::memory_order_release); +} + +// ─── Frame state / direct present ──────────────────────────────────── + +void ProjectionLayer::on_frame_begin() +{ + // VizSession's begin_frame calls this on every layer. Clearing the + // flag here means a layer that fails to submit between begin_frame and + // end_frame is skipped (kXr) at acquire_direct_views() time. + submitted_this_frame_.store(false, std::memory_order_release); +} + +std::vector ProjectionLayer::get_wait_semaphores() const +{ + std::vector waits; + const uint8_t cur = last_in_use_slot_.load(std::memory_order_acquire); + if (cur == kSlotNone) + { + return waits; + } + const auto add = [&](const DeviceImage& img) + { + const uint64_t value = img.cuda_done_writing_value(); + if (value == 0) + { + return; + } + WaitSemaphore w{}; + w.semaphore = img.cuda_done_writing(); + w.value = value; + // The backend copies these images (vkCmdCopyImage), so gate the + // CUDA-done wait at the transfer stage, not the fragment stage. + w.wait_stage = VK_PIPELINE_STAGE_TRANSFER_BIT; + waits.push_back(w); + }; + for (uint32_t v = 0; v < view_count_; ++v) + { + if (slots_color_[cur].size() > v && slots_color_[cur][v]) + { + add(*slots_color_[cur][v]); + } + if (has_depth_ && slots_depth_[cur].size() > v && slots_depth_[cur][v]) + { + add(*slots_depth_[cur][v]); + } + } + return waits; +} + +std::vector ProjectionLayer::acquire_direct_views(uint32_t in_flight_slot) +{ + if (in_flight_slot >= kMaxFramesInFlight) + { + throw std::logic_error("ProjectionLayer: in_flight_slot exceeds kMaxFramesInFlight"); + } + + const bool xr_mode = session() != nullptr && session()->is_xr_mode(); + const uint8_t latest = latest_.load(std::memory_order_acquire); + + // Freshness gate: in kXr never hand the runtime stale RGBD under this + // frame's pose; nothing-published is the same skip. On skip leave no + // in-use slot so get_wait_semaphores() is empty and the backend clears + // the swapchains. + const bool skip = latest == kSlotNone || (xr_mode && !submitted_this_frame_.load(std::memory_order_acquire)); + if (skip) + { + in_use_[in_flight_slot].store(kSlotNone, std::memory_order_release); + last_in_use_slot_.store(kSlotNone, std::memory_order_release); + return {}; + } + + // Promote the latest publish so get_wait_semaphores() waits on its + // cuda_done_writing before the backend copies it. + in_use_[in_flight_slot].store(latest, std::memory_order_release); + last_in_use_slot_.store(latest, std::memory_order_release); + + std::vector views; + views.reserve(view_count_); + for (uint32_t v = 0; v < view_count_; ++v) + { + DirectPresentView dv{}; + dv.extent = config_.view_resolution; + if (slots_color_[latest].size() > v && slots_color_[latest][v]) + { + dv.color = slots_color_[latest][v]->vk_image(); + } + if (has_depth_ && slots_depth_[latest].size() > v && slots_depth_[latest][v]) + { + dv.depth = slots_depth_[latest][v]->vk_image(); + } + views.push_back(dv); + } + return views; +} + +// ─── Accessors ─────────────────────────────────────────────────────── + +void ProjectionLayer::validate_backend_compatibility(Resolution recommended_view_resolution, + uint32_t backend_view_count, + uint32_t backend_image_count) const +{ + // Images are copied 1:1 into the swapchain, so sizes must match. + if (config_.view_resolution.width != recommended_view_resolution.width || + config_.view_resolution.height != recommended_view_resolution.height) + { + throw std::invalid_argument( + "ProjectionLayer: view_resolution (" + std::to_string(config_.view_resolution.width) + "x" + + std::to_string(config_.view_resolution.height) + ") must equal the display's recommended per-view size (" + + std::to_string(recommended_view_resolution.width) + "x" + std::to_string(recommended_view_resolution.height) + + "); use VizSession::get_recommended_resolution() to size the layer."); + } + // A stereo display needs >= that many views, else an eye is blank (a + // stereo layer on a mono display is fine — the left eye is used). + if (view_count_ < backend_view_count) + { + throw std::invalid_argument("ProjectionLayer: a mono layer cannot drive a " + std::to_string(backend_view_count) + + "-view (stereo) display; construct with Config::stereo = true."); + } + // The in-use slot is the backend's in-flight image index; fail at attach. + if (backend_image_count > kMaxFramesInFlight) + { + throw std::invalid_argument( + "ProjectionLayer: backend cycles " + std::to_string(backend_image_count) + + " in-flight images, exceeding kMaxFramesInFlight=" + std::to_string(kMaxFramesInFlight) + "."); + } +} + +Resolution ProjectionLayer::view_resolution() const noexcept +{ + return config_.view_resolution; +} + +PixelFormat ProjectionLayer::color_format() const noexcept +{ + return config_.color_format; +} + +std::optional ProjectionLayer::depth_format() const noexcept +{ + return config_.depth_format; +} + +bool ProjectionLayer::is_stereo() const noexcept +{ + return config_.stereo; +} + +uint32_t ProjectionLayer::view_count() const noexcept +{ + return view_count_; +} + +const DeviceImage* ProjectionLayer::color_image(uint32_t slot, uint32_t view) const noexcept +{ + if (slot >= kSlotCount || view >= view_count_ || slots_color_[slot].size() <= view) + { + return nullptr; + } + return slots_color_[slot][view].get(); +} + +const DeviceImage* ProjectionLayer::depth_image(uint32_t slot, uint32_t view) const noexcept +{ + if (!has_depth_ || slot >= kSlotCount || view >= view_count_ || slots_depth_[slot].size() <= view) + { + return nullptr; + } + return slots_depth_[slot][view].get(); +} + +} // namespace viz diff --git a/src/viz/layers_tests/cpp/CMakeLists.txt b/src/viz/layers_tests/cpp/CMakeLists.txt index 1eb371a92..a0ba72133 100644 --- a/src/viz/layers_tests/cpp/CMakeLists.txt +++ b/src/viz/layers_tests/cpp/CMakeLists.txt @@ -7,6 +7,7 @@ cmake_minimum_required(VERSION 3.20) # the production layers in viz::layers (QuadLayer, ...). add_executable(viz_layers_tests test_clear_rect_layer.cpp + test_projection_layer.cpp test_quad_layer.cpp test_throwing_layer.cpp ) diff --git a/src/viz/layers_tests/cpp/test_projection_layer.cpp b/src/viz/layers_tests/cpp/test_projection_layer.cpp new file mode 100644 index 000000000..c13329ab8 --- /dev/null +++ b/src/viz/layers_tests/cpp/test_projection_layer.cpp @@ -0,0 +1,497 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +// Tests for ProjectionLayer: config validation (unit-level) and the +// CUDA-Vulkan interop mailbox + submit (gpu-level). ProjectionLayer is +// direct-present-only (no render pipeline); end-to-end copy-to-swapchain +// + readback lives in viz_session_tests where the full backend exists. + +#include "test_helpers.hpp" + +#include +#include +#include +#include +#include + +#include +#include +#include + +using viz::DeviceImage; +using viz::PixelFormat; +using viz::ProjectionLayer; +using viz::VizBuffer; +using viz::VkContext; + +using viz::testing::is_gpu_available; + +namespace +{ + +struct CudaFreeGuard +{ + void* p = nullptr; + ~CudaFreeGuard() + { + if (p != nullptr) + { + cudaFree(p); + } + } +}; + +} // namespace + +// ── Unit: config validation without GPU ───────────────────────────── +// +// Config is validated BEFORE the VkContext, so these run with an +// uninitialized context. The message matchers pin each test to the +// config check it targets — without them an uninitialized-context throw +// would satisfy CHECK_THROWS_AS for the wrong reason. + +using Catch::Matchers::ContainsSubstring; + +TEST_CASE("ProjectionLayer ctor rejects non-RGBA8 color format", "[unit][projection_layer]") +{ + VkContext ctx; + ProjectionLayer::Config cfg; + cfg.view_resolution = { 64, 64 }; + cfg.color_format = PixelFormat::kD32F; + CHECK_THROWS_WITH(ProjectionLayer(ctx, cfg), ContainsSubstring("color_format")); +} + +TEST_CASE("ProjectionLayer ctor rejects non-D32F depth format", "[unit][projection_layer]") +{ + VkContext ctx; + ProjectionLayer::Config cfg; + cfg.view_resolution = { 64, 64 }; + cfg.depth_format = PixelFormat::kRGBA8; + CHECK_THROWS_WITH(ProjectionLayer(ctx, cfg), ContainsSubstring("depth_format")); +} + +TEST_CASE("ProjectionLayer ctor rejects zero view_resolution", "[unit][projection_layer]") +{ + VkContext ctx; + ProjectionLayer::Config cfg; + cfg.view_resolution = { 0, 64 }; + CHECK_THROWS_WITH(ProjectionLayer(ctx, cfg), ContainsSubstring("view_resolution")); +} + +// ── GPU: backend-compatibility validation ─────────────────────────── + +TEST_CASE("ProjectionLayer validate_backend_compatibility enforces the direct-present contract", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 64, 64 }; + ProjectionLayer mono(ctx, cfg); + + // Matching mono display: ok. + CHECK_NOTHROW(mono.validate_backend_compatibility({ 64, 64 }, 1, 3)); + // Resolution mismatch (would make the 1:1 swapchain copy out-of-bounds). + CHECK_THROWS_WITH(mono.validate_backend_compatibility({ 128, 128 }, 1, 3), ContainsSubstring("view_resolution")); + // Mono layer can't drive a 2-view (stereo) display. + CHECK_THROWS_WITH(mono.validate_backend_compatibility({ 64, 64 }, 2, 3), ContainsSubstring("stereo")); + // Backend cycles more in-flight images than the mailbox can hold. + CHECK_THROWS_WITH(mono.validate_backend_compatibility({ 64, 64 }, 1, ProjectionLayer::kMaxFramesInFlight + 1), + ContainsSubstring("kMaxFramesInFlight")); + + // A stereo layer is allowed on a mono display (the left eye is used). + cfg.stereo = true; + ProjectionLayer stereo(ctx, cfg); + CHECK_NOTHROW(stereo.validate_backend_compatibility({ 64, 64 }, 1, 3)); + CHECK_NOTHROW(stereo.validate_backend_compatibility({ 64, 64 }, 2, 3)); +} + +// ── GPU: construction + accessors ─────────────────────────────────── + +TEST_CASE("ProjectionLayer mono+depth creates valid handles for every slot+view", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 64, 64 }; + ProjectionLayer layer(ctx, cfg); + + CHECK(layer.name() == "ProjectionLayer"); + CHECK(layer.view_count() == 1); + CHECK_FALSE(layer.is_stereo()); + CHECK(layer.color_format() == PixelFormat::kRGBA8); + CHECK(layer.depth_format().has_value()); + CHECK(*layer.depth_format() == PixelFormat::kD32F); + + for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s) + { + REQUIRE(layer.color_image(s, 0) != nullptr); + CHECK(layer.color_image(s, 0)->vk_image() != VK_NULL_HANDLE); + CHECK(layer.color_image(s, 0)->cuda_array() != nullptr); + REQUIRE(layer.depth_image(s, 0) != nullptr); + CHECK(layer.depth_image(s, 0)->vk_image() != VK_NULL_HANDLE); + CHECK(layer.depth_image(s, 0)->cuda_array() != nullptr); + // View index out of range returns nullptr. + CHECK(layer.color_image(s, 1) == nullptr); + CHECK(layer.depth_image(s, 1) == nullptr); + } + CHECK(layer.color_image(ProjectionLayer::kSlotCount, 0) == nullptr); +} + +TEST_CASE("ProjectionLayer stereo allocates per-eye storage", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 64, 64 }; + cfg.stereo = true; + ProjectionLayer layer(ctx, cfg); + + CHECK(layer.view_count() == 2); + CHECK(layer.is_stereo()); + for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s) + { + REQUIRE(layer.color_image(s, 0) != nullptr); + REQUIRE(layer.color_image(s, 1) != nullptr); + REQUIRE(layer.depth_image(s, 0) != nullptr); + REQUIRE(layer.depth_image(s, 1) != nullptr); + CHECK(layer.color_image(s, 0)->vk_image() != layer.color_image(s, 1)->vk_image()); + } +} + +TEST_CASE("ProjectionLayer no-depth skips depth allocation", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 32, 32 }; + cfg.depth_format = std::nullopt; + ProjectionLayer layer(ctx, cfg); + + CHECK_FALSE(layer.depth_format().has_value()); + for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s) + { + REQUIRE(layer.color_image(s, 0) != nullptr); + CHECK(layer.depth_image(s, 0) == nullptr); + } +} + +TEST_CASE("ProjectionLayer destroy is idempotent", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 32, 32 }; + ProjectionLayer layer(ctx, cfg); + + layer.destroy(); + layer.destroy(); // second call must be a no-op +} + +// ── GPU: submit validation ────────────────────────────────────────── + +TEST_CASE("ProjectionLayer::submit rejects bad call shapes", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 64, 64 }; + cfg.stereo = false; + ProjectionLayer layer(ctx, cfg); + + void* color_dev = nullptr; + void* depth_dev = nullptr; + REQUIRE(cudaMalloc(&color_dev, 64 * 64 * 4) == cudaSuccess); + REQUIRE(cudaMalloc(&depth_dev, 64 * 64 * 4) == cudaSuccess); + CudaFreeGuard cg{ color_dev }; + CudaFreeGuard dg{ depth_dev }; + + VizBuffer color{}; + color.data = color_dev; + color.width = 64; + color.height = 64; + color.format = PixelFormat::kRGBA8; + color.space = viz::MemorySpace::kDevice; + + VizBuffer depth{}; + depth.data = depth_dev; + depth.width = 64; + depth.height = 64; + depth.format = PixelFormat::kD32F; + depth.space = viz::MemorySpace::kDevice; + + SECTION("missing depth on depth-enabled layer") + { + CHECK_THROWS_AS(layer.submit(color), std::invalid_argument); + } + SECTION("mono layer rejects right-eye buffers") + { + CHECK_THROWS_AS(layer.submit(color, &depth, &color, &depth), std::invalid_argument); + } + SECTION("dimension mismatch rejected") + { + VizBuffer bad = color; + bad.width = 32; + CHECK_THROWS_AS(layer.submit(bad, &depth), std::invalid_argument); + } + SECTION("color format mismatch rejected") + { + VizBuffer bad = color; + bad.format = PixelFormat::kD32F; + CHECK_THROWS_AS(layer.submit(bad, &depth), std::invalid_argument); + } + SECTION("kHost rejected") + { + VizBuffer bad = color; + bad.space = viz::MemorySpace::kHost; + CHECK_THROWS_AS(layer.submit(bad, &depth), std::invalid_argument); + } +} + +TEST_CASE("ProjectionLayer::submit mono+depth advances mailbox + signals semaphores", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 64, 64 }; + ProjectionLayer layer(ctx, cfg); + + void* color_dev = nullptr; + void* depth_dev = nullptr; + REQUIRE(cudaMalloc(&color_dev, 64 * 64 * 4) == cudaSuccess); + REQUIRE(cudaMalloc(&depth_dev, 64 * 64 * 4) == cudaSuccess); + CudaFreeGuard cg{ color_dev }; + CudaFreeGuard dg{ depth_dev }; + + // Initialize to known patterns so we can verify the layer actually + // received our content. cudaMemset is sync-on-default-stream. + REQUIRE(cudaMemset(color_dev, 0x7F, 64 * 64 * 4) == cudaSuccess); + REQUIRE(cudaMemset(depth_dev, 0x40, 64 * 64 * 4) == cudaSuccess); + + VizBuffer color{}; + color.data = color_dev; + color.width = 64; + color.height = 64; + color.format = PixelFormat::kRGBA8; + color.space = viz::MemorySpace::kDevice; + + VizBuffer depth{}; + depth.data = depth_dev; + depth.width = 64; + depth.height = 64; + depth.format = PixelFormat::kD32F; + depth.space = viz::MemorySpace::kDevice; + + // Pre-submit: no semaphore has been signaled. + for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s) + { + CHECK(layer.color_image(s, 0)->cuda_done_writing_value() == 0); + CHECK(layer.depth_image(s, 0)->cuda_done_writing_value() == 0); + } + + // First submit lands in some slot; that slot's color + depth + // semaphores both advance to 1. + layer.submit(color, &depth); + + uint32_t signaled = 0; + for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s) + { + const uint64_t cval = layer.color_image(s, 0)->cuda_done_writing_value(); + const uint64_t dval = layer.depth_image(s, 0)->cuda_done_writing_value(); + if (cval > 0 && dval > 0) + { + ++signaled; + } + } + CHECK(signaled == 1); +} + +TEST_CASE("ProjectionLayer::submit stereo requires both eyes", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 64, 64 }; + cfg.stereo = true; + ProjectionLayer layer(ctx, cfg); + + void* color_dev = nullptr; + void* depth_dev = nullptr; + REQUIRE(cudaMalloc(&color_dev, 64 * 64 * 4) == cudaSuccess); + REQUIRE(cudaMalloc(&depth_dev, 64 * 64 * 4) == cudaSuccess); + CudaFreeGuard cg{ color_dev }; + CudaFreeGuard dg{ depth_dev }; + + VizBuffer color{}; + color.data = color_dev; + color.width = 64; + color.height = 64; + color.format = PixelFormat::kRGBA8; + color.space = viz::MemorySpace::kDevice; + + VizBuffer depth{}; + depth.data = depth_dev; + depth.width = 64; + depth.height = 64; + depth.format = PixelFormat::kD32F; + depth.space = viz::MemorySpace::kDevice; + + // Stereo without right buffers throws. + CHECK_THROWS_AS(layer.submit(color, &depth), std::invalid_argument); + + // Stereo with both eyes succeeds. + layer.submit(color, &depth, &color, &depth); + uint32_t signaled = 0; + for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s) + { + const bool left = layer.color_image(s, 0)->cuda_done_writing_value() > 0 && + layer.depth_image(s, 0)->cuda_done_writing_value() > 0; + const bool right = layer.color_image(s, 1)->cuda_done_writing_value() > 0 && + layer.depth_image(s, 1)->cuda_done_writing_value() > 0; + if (left && right) + { + ++signaled; + } + } + CHECK(signaled == 1); +} + +TEST_CASE("ProjectionLayer::submit no-depth path accepts color only", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 32, 32 }; + cfg.depth_format = std::nullopt; + ProjectionLayer layer(ctx, cfg); + + void* color_dev = nullptr; + REQUIRE(cudaMalloc(&color_dev, 32 * 32 * 4) == cudaSuccess); + CudaFreeGuard cg{ color_dev }; + + VizBuffer color{}; + color.data = color_dev; + color.width = 32; + color.height = 32; + color.format = PixelFormat::kRGBA8; + color.space = viz::MemorySpace::kDevice; + + // depth-disabled layer must NOT accept a depth buffer. + VizBuffer fake_depth = color; + fake_depth.format = PixelFormat::kD32F; + CHECK_THROWS_AS(layer.submit(color, &fake_depth), std::invalid_argument); + + // Without depth, submit succeeds. + layer.submit(color); + + uint32_t signaled = 0; + for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s) + { + if (layer.color_image(s, 0)->cuda_done_writing_value() > 0) + { + ++signaled; + } + } + CHECK(signaled == 1); +} + +TEST_CASE("ProjectionLayer acquire_direct_views returns latest slot images", "[gpu][projection_layer]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + VkContext ctx; + ctx.init({}); + + ProjectionLayer::Config cfg; + cfg.view_resolution = { 32, 32 }; + ProjectionLayer layer(ctx, cfg); + + // Nothing published yet → no direct views. + CHECK(layer.acquire_direct_views(0).empty()); + + void* color_dev = nullptr; + void* depth_dev = nullptr; + REQUIRE(cudaMalloc(&color_dev, 32 * 32 * 4) == cudaSuccess); + REQUIRE(cudaMalloc(&depth_dev, 32 * 32 * 4) == cudaSuccess); + CudaFreeGuard cg{ color_dev }; + CudaFreeGuard dg{ depth_dev }; + + VizBuffer color{}; + color.data = color_dev; + color.width = 32; + color.height = 32; + color.format = PixelFormat::kRGBA8; + color.space = viz::MemorySpace::kDevice; + VizBuffer depth{}; + depth.data = depth_dev; + depth.width = 32; + depth.height = 32; + depth.format = PixelFormat::kD32F; + depth.space = viz::MemorySpace::kDevice; + + layer.on_frame_begin(); + layer.submit(color, &depth); + + // Offscreen (no session attached → not XR), so the freshness gate is + // off and the latest publish is returned. + auto views = layer.acquire_direct_views(0); + REQUIRE(views.size() == 1); + CHECK(views[0].color != VK_NULL_HANDLE); + CHECK(views[0].depth != VK_NULL_HANDLE); + CHECK(views[0].extent.width == 32); + CHECK(views[0].extent.height == 32); + + // get_wait_semaphores now waits on the promoted slot at TRANSFER stage. + const auto waits = layer.get_wait_semaphores(); + REQUIRE(waits.size() == 2); // color + depth + for (const auto& w : waits) + { + CHECK(w.wait_stage == VK_PIPELINE_STAGE_TRANSFER_BIT); + } +} diff --git a/src/viz/python/core_bindings.cpp b/src/viz/python/core_bindings.cpp index 5a1eb129f..7bbcb424e 100644 --- a/src/viz/python/core_bindings.cpp +++ b/src/viz/python/core_bindings.cpp @@ -120,6 +120,21 @@ orientation : (w, x, y, z) quaternion (identity = (1, 0, 0, 0)) .def_readwrite("angle_up", &viz::Fov::angle_up) .def_readwrite("angle_down", &viz::Fov::angle_down); + py::class_(m, "ViewInfo", + R"doc( +Per-eye render target metadata returned in ``FrameInfo.views``. In XR +mode (kXr stereo), 2 entries — one per eye. In window / offscreen, 1 +entry with identity pose. + +Exposes ``viewport``, ``fov`` and ``pose`` (all from the predicted- +display-time XR data for this frame). Renderers build their own view / +projection matrices from ``pose`` + ``fov``. +)doc") + .def(py::init<>()) + .def_readonly("viewport", &viz::ViewInfo::viewport) + .def_readonly("fov", &viz::ViewInfo::fov) + .def_readonly("pose", &viz::ViewInfo::pose); + // ── VizBuffer (with cuda/numpy interface) ────────────────────────── py::class_(m, "VizBuffer", diff --git a/src/viz/python/layers_bindings.cpp b/src/viz/python/layers_bindings.cpp index dba90c76c..ddf9ffa2b 100644 --- a/src/viz/python/layers_bindings.cpp +++ b/src/viz/python/layers_bindings.cpp @@ -13,10 +13,13 @@ #include #include #include +#include #include #include #include +#include +#include namespace viz_py { @@ -134,6 +137,120 @@ numpy on a CUDA device pointer); the binding converts it on the fly. .def("set_visible", &viz::QuadLayer::set_visible, "visible"_a) .def("is_visible", &viz::QuadLayer::is_visible) .def_property_readonly("name", [](const viz::QuadLayer& l) { return l.name(); }); + + // ── ProjectionLayer ──────────────────────────────────────────────── + + py::class_(m, "ProjectionLayerConfig") + .def(py::init<>()) + .def_readwrite("name", &viz::ProjectionLayer::Config::name) + .def_readwrite("view_resolution", &viz::ProjectionLayer::Config::view_resolution) + .def_readwrite("color_format", &viz::ProjectionLayer::Config::color_format) + .def_readwrite("depth_format", &viz::ProjectionLayer::Config::depth_format, + "PixelFormat.D32F for depth output (Z-composite with QuadLayer); None to disable.") + .def_readwrite("stereo", &viz::ProjectionLayer::Config::stereo, + "Per-eye paired storage. When True, submit() requires both eyes' buffers; " + "in kXr view 0 → left, view 1 → right."); + + py::class_>(m, "ProjectionLayer", + R"doc( +Full-view RGBD layer. Owned by VizSession; the Python handle is +non-owning (don't keep it around past the session). + +Designed for renderers (gsplat, nvblox, neural reconstruction) that +produce per-view (color, depth) buffers. The renderer runs IN-LOOP with +the OpenXR frame loop — `submit()` must be called between +`session.begin_frame()` and `session.end_frame()`, and the renderer +must render against `info.views[i].pose` from the FrameInfo returned by +`begin_frame()`. + +Typical pattern:: + + while running: + info = session.begin_frame() + color, depth = renderer.render(info.views) + layer.submit(color, depth=depth) + session.end_frame() + +If the renderer is slower than display rate, the runtime / CloudXR +paces the application via xrWaitFrame and reprojects the last submitted +frame at display rate. In `kXr`, a visible ProjectionLayer that fails +to submit for the current frame is skipped at record time so stale RGBD +isn't composited under a new projection-layer pose. + +Each buffer is a VizBuffer or any __cuda_array_interface__ object +(cupy / torch / numba). submit() does one CUDA→CUDA copy per buffer on +the supplied stream and BLOCKS on cudaStreamSynchronize so the caller +can re-use ``color`` / ``depth`` immediately. +)doc") + .def( + "submit", + [](viz::ProjectionLayer& self, py::object left_color, py::object left_depth, py::object right_color, + py::object right_depth, uintptr_t stream) + { + auto to_buf = [&self](py::object obj, viz::PixelFormat fmt, const char* label) -> viz::VizBuffer + { + if (py::isinstance(obj)) + { + return obj.cast(); + } + return cuda_array_to_viz_buffer(obj, fmt, self.view_resolution(), label); + }; + + // Materialize each buffer (or std::nullopt). View slots + // that aren't provided pass nullptr through to submit. + std::optional lc; + std::optional ld; + std::optional rc; + std::optional rd; + if (!left_color.is_none()) + { + lc = to_buf(left_color, self.color_format(), "ProjectionLayer.submit(left_color)"); + } + else + { + throw std::runtime_error("ProjectionLayer.submit: left_color is required"); + } + if (!left_depth.is_none()) + { + ld = to_buf(left_depth, viz::PixelFormat::kD32F, "ProjectionLayer.submit(left_depth)"); + } + if (!right_color.is_none()) + { + rc = to_buf(right_color, self.color_format(), "ProjectionLayer.submit(right_color)"); + } + if (!right_depth.is_none()) + { + rd = to_buf(right_depth, viz::PixelFormat::kD32F, "ProjectionLayer.submit(right_depth)"); + } + + py::gil_scoped_release release; + try + { + self.submit(*lc, ld.has_value() ? &*ld : nullptr, rc.has_value() ? &*rc : nullptr, + rd.has_value() ? &*rd : nullptr, reinterpret_cast(stream)); + } + catch (const std::invalid_argument& e) + { + // C++ submit reports bad call shapes as invalid_argument + // (→ ValueError); re-raise as runtime_error so it surfaces + // as RuntimeError, matching the buffer-conversion errors. + throw std::runtime_error(e.what()); + } + }, + "left_color"_a, "left_depth"_a = py::none(), "right_color"_a = py::none(), "right_depth"_a = py::none(), + "stream"_a = 0, + "Submit a frame. Each arg is a VizBuffer or any __cuda_array_interface__ object. " + "Mono: only ``left_color`` (+ ``left_depth`` if depth-enabled). " + "Stereo: pair with ``right_color`` (+ depths). Buffers must match view_resolution " + "and the layer's pixel formats. Releases the GIL across the copy + sync.") + .def_property_readonly("view_resolution", &viz::ProjectionLayer::view_resolution) + .def_property_readonly("color_format", &viz::ProjectionLayer::color_format) + .def_property_readonly("depth_format", &viz::ProjectionLayer::depth_format) + .def_property_readonly("stereo", &viz::ProjectionLayer::is_stereo) + .def_property_readonly("view_count", &viz::ProjectionLayer::view_count) + .def("set_visible", &viz::ProjectionLayer::set_visible, "visible"_a) + .def("is_visible", &viz::ProjectionLayer::is_visible) + .def_property_readonly("name", [](const viz::ProjectionLayer& l) { return l.name(); }); } } // namespace viz_py diff --git a/src/viz/python/session_bindings.cpp b/src/viz/python/session_bindings.cpp index 8698cced4..387f96a6f 100644 --- a/src/viz/python/session_bindings.cpp +++ b/src/viz/python/session_bindings.cpp @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -35,7 +36,10 @@ void bind_session(py::module_& m) .def_readonly("predicted_display_time", &viz::FrameInfo::predicted_display_time) .def_readonly("delta_time", &viz::FrameInfo::delta_time) .def_readonly("should_render", &viz::FrameInfo::should_render) - .def_readonly("resolution", &viz::FrameInfo::resolution); + .def_readonly("resolution", &viz::FrameInfo::resolution) + .def_readonly("views", &viz::FrameInfo::views, + "Per-eye render target metadata. 2 entries in XR stereo, 1 in window/offscreen. " + "Renderers should render against ``views[i].pose`` + ``views[i].fov``."); py::class_(m, "FrameTimingStats") .def(py::init<>()) @@ -106,6 +110,20 @@ Construct via ``VizSession.create(config)``. Add layers with }, "config"_a, py::return_value_policy::reference_internal, "Construct + register a QuadLayer. Returns a non-owning handle.") + .def( + "add_projection_layer", + [](viz::VizSession& self, viz::ProjectionLayer::Config config) -> viz::ProjectionLayer* + { + const auto* ctx = self.get_vk_context(); + if (ctx == nullptr) + { + throw std::runtime_error("VizSession: cannot add layer before session is initialized"); + } + // ProjectionLayer is direct-present-only — no render pass. + return self.add_layer(*ctx, std::move(config)); + }, + "config"_a, py::return_value_policy::reference_internal, + "Construct + register a ProjectionLayer. Returns a non-owning handle.") .def("render", &viz::VizSession::render, py::call_guard(), "Wait + composite + present in one call. Returns FrameInfo.") .def("begin_frame", &viz::VizSession::begin_frame, py::call_guard()) diff --git a/src/viz/python/viz_init.py b/src/viz/python/viz_init.py index c07923b0d..0ed6252a6 100644 --- a/src/viz/python/viz_init.py +++ b/src/viz/python/viz_init.py @@ -40,12 +40,15 @@ MemorySpace, PixelFormat, Pose3D, + ProjectionLayer, + ProjectionLayerConfig, QuadLayer, QuadLayerConfig, QuadLayerPlacement, Rect2D, Resolution, SessionState, + ViewInfo, VizBuffer, VizSession, VizSessionConfig, @@ -62,12 +65,15 @@ "MemorySpace", "PixelFormat", "Pose3D", + "ProjectionLayer", + "ProjectionLayerConfig", "QuadLayer", "QuadLayerConfig", "QuadLayerPlacement", "Rect2D", "Resolution", "SessionState", + "ViewInfo", "VizBuffer", "VizSession", "VizSessionConfig", diff --git a/src/viz/python_tests/test_projection_layer.py b/src/viz/python_tests/test_projection_layer.py new file mode 100644 index 000000000..a2da89b8c --- /dev/null +++ b/src/viz/python_tests/test_projection_layer.py @@ -0,0 +1,261 @@ +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +"""End-to-end ProjectionLayer tests via Python bindings. + +Covers: config plumbing, add_projection_layer, submit shape validation, +mono+depth round-trip render, stereo + no-depth variants. GPU-gated. +""" + +from __future__ import annotations + +import numpy as np +import pytest + +import isaacteleop.viz as viz + + +def _gpu_available() -> bool: + cfg = viz.VizSessionConfig() + cfg.mode = viz.DisplayMode.kOffscreen + cfg.window_width = 64 + cfg.window_height = 64 + s = None + try: + s = viz.VizSession.create(cfg) + except RuntimeError: + return False + finally: + if s is not None: + s.destroy() + return True + + +pytestmark = pytest.mark.skipif( + not _gpu_available(), reason="no Vulkan/CUDA-capable GPU" +) + + +def _need_cupy(): + cp = pytest.importorskip("cupy") + try: + if cp.cuda.runtime.getDeviceCount() == 0: + pytest.skip("no CUDA device") + except cp.cuda.runtime.CUDARuntimeError: + pytest.skip("no CUDA device") + return cp + + +def _make_session(width=64, height=64): + cfg = viz.VizSessionConfig() + cfg.mode = viz.DisplayMode.kOffscreen + cfg.window_width = width + cfg.window_height = height + cfg.clear_color = (0.0, 0.0, 0.0, 1.0) + return viz.VizSession.create(cfg) + + +def test_projection_layer_config_roundtrip(): + cfg = viz.ProjectionLayerConfig() + cfg.name = "test" + cfg.view_resolution = viz.Resolution(128, 64) + cfg.color_format = viz.PixelFormat.kRGBA8 + cfg.depth_format = viz.PixelFormat.kD32F + cfg.stereo = True + + assert cfg.name == "test" + assert cfg.view_resolution.width == 128 + assert cfg.view_resolution.height == 64 + assert cfg.depth_format == viz.PixelFormat.kD32F + assert cfg.stereo is True + + # depth_format can be None + cfg.depth_format = None + assert cfg.depth_format is None + + +def test_add_projection_layer_mono_depth(): + cp = _need_cupy() + session = _make_session() + try: + layer_cfg = viz.ProjectionLayerConfig() + layer_cfg.name = "proj" + layer_cfg.view_resolution = viz.Resolution(32, 32) + layer = session.add_projection_layer(layer_cfg) + + assert layer.name == "proj" + assert layer.is_visible() is True + assert layer.view_resolution.width == 32 + assert layer.view_resolution.height == 32 + assert layer.color_format == viz.PixelFormat.kRGBA8 + assert layer.depth_format == viz.PixelFormat.kD32F + assert layer.stereo is False + assert layer.view_count == 1 + + # Submit mono + depth via cupy. + host_color = np.zeros((32, 32, 4), dtype=np.uint8) + host_color[..., 2] = 200 # blue channel + host_color[..., 3] = 255 + host_depth = np.full((32, 32), 0.5, dtype=np.float32) + color = cp.asarray(host_color) + depth = cp.asarray(host_depth) + + layer.submit(color, depth) + + info = session.render() + assert info.frame_index == 0 + + img = session.readback_to_host() + arr = np.asarray(img) + # Predominantly blue at the center; ProjectionLayer covers the + # whole framebuffer. + r, g, b, _a = arr[32, 32] + assert b > r and b > g + finally: + session.destroy() + + +def test_submit_rejects_missing_depth_on_depth_layer(): + cp = _need_cupy() + session = _make_session() + try: + layer_cfg = viz.ProjectionLayerConfig() + layer_cfg.view_resolution = viz.Resolution(32, 32) + layer = session.add_projection_layer(layer_cfg) + + color = cp.asarray(np.zeros((32, 32, 4), dtype=np.uint8)) + with pytest.raises(RuntimeError, match="left_depth"): + layer.submit(color) + finally: + session.destroy() + + +def test_submit_rejects_dimension_mismatch(): + cp = _need_cupy() + session = _make_session() + try: + layer_cfg = viz.ProjectionLayerConfig() + layer_cfg.view_resolution = viz.Resolution(32, 32) + layer = session.add_projection_layer(layer_cfg) + + # Wrong width. + wrong_color = cp.asarray(np.zeros((32, 16, 4), dtype=np.uint8)) + depth = cp.asarray(np.zeros((32, 32), dtype=np.float32)) + with pytest.raises(RuntimeError, match="resolution"): + layer.submit(wrong_color, depth) + finally: + session.destroy() + + +def test_stereo_round_trip(): + cp = _need_cupy() + session = _make_session() + try: + layer_cfg = viz.ProjectionLayerConfig() + layer_cfg.view_resolution = viz.Resolution(32, 32) + layer_cfg.stereo = True + layer = session.add_projection_layer(layer_cfg) + assert layer.stereo is True + assert layer.view_count == 2 + + host_lc = np.zeros((32, 32, 4), dtype=np.uint8) + host_lc[..., 0] = 200 # red for LEFT + host_lc[..., 3] = 255 + host_rc = np.zeros((32, 32, 4), dtype=np.uint8) + host_rc[..., 1] = 200 # green for RIGHT + host_rc[..., 3] = 255 + host_d = np.full((32, 32), 0.5, dtype=np.float32) + lc = cp.asarray(host_lc) + rc = cp.asarray(host_rc) + ld = cp.asarray(host_d) + rd = cp.asarray(host_d) + + # Stereo without right eye → must throw. + with pytest.raises(RuntimeError, match="right_color"): + layer.submit(lc, ld) + + # Stereo with both eyes. + layer.submit(lc, ld, rc, rd) + session.render() + # In offscreen (single-view), the LEFT buffer is sampled — so the + # readback should be predominantly red. + arr = np.asarray(session.readback_to_host()) + r, g, b, _a = arr[32, 32] + assert r > g and r > b + finally: + session.destroy() + + +def test_no_depth_layer(): + cp = _need_cupy() + session = _make_session() + try: + layer_cfg = viz.ProjectionLayerConfig() + layer_cfg.view_resolution = viz.Resolution(32, 32) + layer_cfg.depth_format = None + layer = session.add_projection_layer(layer_cfg) + assert layer.depth_format is None + + host_color = np.zeros((32, 32, 4), dtype=np.uint8) + host_color[..., 0] = 255 # red + host_color[..., 3] = 255 + color = cp.asarray(host_color) + + # Depth-disabled layer must reject any depth buffer. + depth = cp.asarray(np.zeros((32, 32), dtype=np.float32)) + with pytest.raises(RuntimeError, match="depth-disabled"): + layer.submit(color, depth) + + layer.submit(color) + session.render() + arr = np.asarray(session.readback_to_host()) + r, g, b, _a = arr[32, 32] + assert r > g and r > b + finally: + session.destroy() + + +def test_begin_frame_returns_views_for_renderer(): + """``session.begin_frame()`` is the source of truth for poses the + renderer should render against. In offscreen mode the backend + returns a single identity-pose ViewInfo.""" + session = _make_session() + try: + info = session.begin_frame() + assert len(info.views) >= 1 + session.end_frame() + finally: + session.destroy() + + +def test_inloop_submit_pattern(): + """The supported pattern: begin_frame → submit (against this frame's + views) → end_frame, all in one tick. Window/offscreen modes have no + XR freshness gate, so the layer renders on every frame that submits.""" + cp = _need_cupy() + session = _make_session() + try: + layer_cfg = viz.ProjectionLayerConfig() + layer_cfg.view_resolution = viz.Resolution(32, 32) + layer = session.add_projection_layer(layer_cfg) + + host_color = np.zeros((32, 32, 4), dtype=np.uint8) + host_color[..., 2] = 200 # blue + host_color[..., 3] = 255 + host_depth = np.full((32, 32), 0.5, dtype=np.float32) + + for _ in range(3): + info = session.begin_frame() + assert info.should_render + # In a real renderer we'd pass info.views to the GPU side; here + # the buffers are static. + color = cp.asarray(host_color) + depth = cp.asarray(host_depth) + layer.submit(color, depth) + session.end_frame() + + # Final readback shows the submitted color. + arr = np.asarray(session.readback_to_host()) + r, g, b, _a = arr[32, 32] + assert b > r and b > g + finally: + session.destroy() diff --git a/src/viz/session/cpp/inc/viz/session/display_backend.hpp b/src/viz/session/cpp/inc/viz/session/display_backend.hpp index 9f74827c5..977753f40 100644 --- a/src/viz/session/cpp/inc/viz/session/display_backend.hpp +++ b/src/viz/session/cpp/inc/viz/session/display_backend.hpp @@ -6,6 +6,7 @@ #include #include #include +#include // DirectPresentView #include #include @@ -94,6 +95,13 @@ class DisplayBackend // Backend-private bookkeeping round-tripped to record_post_* / // end_frame (e.g. swapchain image_index, predicted_display_time). uint64_t backend_token = 0; + + // OpenXR predicted display time in nanoseconds (from + // xrWaitFrame's XrFrameState.predictedDisplayTime), exposed + // through FrameInfo so renderers can use it for time-aware + // content (e.g. animation timestamps that match the runtime's + // prediction). 0 outside kXr. + int64_t predicted_display_time_ns = 0; }; // Acquire the next frame target. nullopt = skip this frame. @@ -109,6 +117,36 @@ class DisplayBackend { } + // True when the backend implements the direct-present path + // (record_direct). The compositor uses it together with a layer's + // supports_direct_present() to choose direct vs. composited. + virtual bool supports_direct() const noexcept + { + return false; + } + + // Direct-present path: copy a direct layer's per-view (color, depth) + // images STRAIGHT into the presentation swapchains, replacing the + // render-pass + record_post_render_pass for this frame. ``views`` has + // one entry per backend view (1 window/offscreen, 2 kXr stereo); the + // source images are in SHADER_READ_ONLY_OPTIMAL with extent equal to + // the swapchain per-view size. Empty ``views`` → clear the swapchains. + // The compositor still threads the layer's CUDA-done wait semaphores + // (TRANSFER stage) into the submit. Default: unsupported. + virtual void record_direct(VkCommandBuffer /*cmd*/, + const Frame& /*frame*/, + const std::vector& /*views*/) + { + } + + // Per-view resolution a direct layer should render at so its copy to + // the swapchain is 1:1 (kXr: per-eye; window/offscreen: the full + // target). Default: the (single-view) render-target extent. + virtual Resolution recommended_view_resolution() const + { + return current_extent(); + } + // Called after a successful submit. The host has NOT waited on the // in-flight fence (multi-frame-in-flight: that wait happens at the // top of render() for this slot's NEXT use), so the GPU may still diff --git a/src/viz/session/cpp/inc/viz/session/layer_base.hpp b/src/viz/session/cpp/inc/viz/session/layer_base.hpp index 8f17a66dd..78e443abb 100644 --- a/src/viz/session/cpp/inc/viz/session/layer_base.hpp +++ b/src/viz/session/cpp/inc/viz/session/layer_base.hpp @@ -16,6 +16,7 @@ namespace viz class RenderTarget; class VizSession; +class VkContext; // Maps ViewInfo::viewport → vkCmdSetViewport (origin top-left, depth // [0,1], no y-flip). Layers call this once per view before drawing. @@ -32,6 +33,20 @@ inline void bind_view_viewport(VkCommandBuffer cmd, const ViewInfo& view) vkCmdSetViewport(cmd, 0, 1, &vp); } +// Per-view source images for the direct-present path: a layer whose +// content is already a full-view (color, depth) image pair the backend +// can copy STRAIGHT into the presentation swapchains, bypassing the +// shared render target + render pass. This mirrors holohub xr_gsplat: +// the renderer's depth lands in the XR depth swapchain verbatim (no +// gl_FragDepth round-trip), so CloudXR reprojection gets exact depth. +// 1 entry for window/offscreen, 2 for kXr stereo. +struct DirectPresentView +{ + VkImage color = VK_NULL_HANDLE; // resting layout SHADER_READ_ONLY_OPTIMAL + VkImage depth = VK_NULL_HANDLE; // VK_NULL_HANDLE when the layer has no depth + Resolution extent{}; // must equal the swapchain per-view size +}; + // Abstract layer drawn into the compositor's render pass (RGBA8_SRGB // color + D32_SFLOAT depth, single-sample). record() issues draw calls; // it must NOT end the render pass or submit. Resource lifetime is the @@ -56,6 +71,16 @@ class LayerBase { } + // Called from ``VizSession::begin_frame`` for EVERY registered layer + // (visible or not) before the new frame's FrameInfo is returned. + // Lets layers clear per-frame state (e.g. ProjectionLayer's + // submitted-this-frame flag). Default = no-op. Must NOT touch GPU + // state — the backend's begin_frame has already run, and the + // compositor's per-slot fence wait hasn't happened yet. + virtual void on_frame_begin() + { + } + // Issue draws inside the active render pass. // views: 1 entry in window/offscreen, 2 in kXr stereo. Each // entry's viewport is this layer's rect for that view — @@ -86,6 +111,51 @@ class LayerBase return {}; } + // True only for ProjectionLayer. VizSession uses it to enforce the + // single-projection XOR multi-quad invariant, and the compositor uses + // it to pick the direct-present path. + virtual bool is_projection_layer() const noexcept + { + return false; + } + + // The VkContext this layer's GPU resources came from (nullptr if none). + // add_layer rejects a layer whose context isn't the session's — its + // images/semaphores would be used on the wrong device/queue. + virtual const VkContext* vk_context() const noexcept + { + return nullptr; + } + + // Direct-present support (see DirectPresentView). When true, the + // compositor — for a session whose only layer is this one — skips the + // render pass and asks the backend to copy these images straight to + // the swapchains. Default: not supported (composited via the RT). + virtual bool supports_direct_present() const noexcept + { + return false; + } + + // Promote this frame's content into ``in_flight_slot`` (same slot the + // compositor passes to record()/get_wait_semaphores) and return the + // per-view source images to copy. Empty vector = nothing fresh to + // present this frame (backend clears the swapchains). Called instead + // of record_pre_render_pass()/record() on the direct path. + virtual std::vector acquire_direct_views(uint32_t /*in_flight_slot*/) + { + return {}; + } + + // Let a layer reject a backend it can't run on. Called once by add_layer + // with the backend's per-view recommended resolution, view count (1 + // window/offscreen, 2 kXr stereo), and in-flight image count; throws + // std::invalid_argument on mismatch. Default: no requirements. + virtual void validate_backend_compatibility(Resolution /*recommended_view_resolution*/, + uint32_t /*backend_view_count*/, + uint32_t /*backend_image_count*/) const + { + } + // Window-mode aspect-fit hint. nullopt = fill the tile; kXr ignores. virtual std::optional aspect_ratio() const noexcept { diff --git a/src/viz/session/cpp/inc/viz/session/offscreen_backend.hpp b/src/viz/session/cpp/inc/viz/session/offscreen_backend.hpp index 2d5ace885..9afbcb618 100644 --- a/src/viz/session/cpp/inc/viz/session/offscreen_backend.hpp +++ b/src/viz/session/cpp/inc/viz/session/offscreen_backend.hpp @@ -29,6 +29,15 @@ class OffscreenBackend final : public DisplayBackend return 1; } + // Direct-present: copy a ProjectionLayer's color image into the RT's + // color attachment (leaving it in TRANSFER_SRC) so readback_to_host + // works unchanged. Depth isn't read back. Empty views → clear. + bool supports_direct() const noexcept override + { + return true; + } + void record_direct(VkCommandBuffer cmd, const Frame& frame, const std::vector& views) override; + // Synchronous tightly-packed RGBA8 copy of the RT's color attachment. HostImage readback_to_host() override; diff --git a/src/viz/session/cpp/inc/viz/session/viz_compositor.hpp b/src/viz/session/cpp/inc/viz/session/viz_compositor.hpp index 6a1050c11..b5ff16e80 100644 --- a/src/viz/session/cpp/inc/viz/session/viz_compositor.hpp +++ b/src/viz/session/cpp/inc/viz/session/viz_compositor.hpp @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -55,15 +56,22 @@ class VizCompositor VizCompositor(VizCompositor&&) = delete; VizCompositor& operator=(VizCompositor&&) = delete; - // Records and submits one frame. Multi-frame-in-flight: one - // FrameSync per backend image slot. render() waits on the slot's - // fence at entry (signaled by its previous use), submits with the - // same fence as signal target, and returns without host-waiting - // on completion. CPU pacing is the caller's responsibility — the - // window backend prefers MAILBOX (no vsync block), so a hot loop - // would burn a core; camera_viz drives this from an event-driven - // condition variable that wakes per producer publish. - void render(const std::vector& layers); + // Records and submits one frame against the backend ``Frame`` + // already acquired by VizSession::begin_frame. Multi-frame-in- + // flight: one FrameSync per backend image slot. render() waits on + // the slot's fence at entry (signaled by its previous use), + // submits with the same fence as signal target, and returns + // without host-waiting on completion. CPU pacing is the caller's + // responsibility — the window backend prefers MAILBOX (no vsync + // block), so a hot loop would burn a core; camera_viz drives this + // from an event-driven condition variable that wakes per producer + // publish. + // + // Owns end_frame / abort_frame protocol balance for the supplied + // ``frame``: on successful submit, calls backend->end_frame; on + // exception, calls backend->abort_frame via RAII guard before + // re-throwing. + void render(const DisplayBackend::Frame& frame, const std::vector& layers); HostImage readback_to_host(); diff --git a/src/viz/session/cpp/inc/viz/session/viz_session.hpp b/src/viz/session/cpp/inc/viz/session/viz_session.hpp index 30a43b208..0e0fbed56 100644 --- a/src/viz/session/cpp/inc/viz/session/viz_session.hpp +++ b/src/viz/session/cpp/inc/viz/session/viz_session.hpp @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -106,11 +107,37 @@ class VizSession // content updates / set_visible(). The session owns the layer's // lifetime. add_layer / remove_layer must run on the same thread // as the frame loop; only LayerBase::set_visible() is atomic. + // + // Layer-mode invariant (for now): a session holds EITHER one + // ProjectionLayer OR any number of QuadLayers, never both. The + // ProjectionLayer is direct-present (copied straight to the swapchain), + // while QuadLayers composite into the shared render target — the two + // paths don't coexist yet. Violations throw std::invalid_argument. template L* add_layer(Args&&... args) { auto layer = std::make_unique(std::forward(args)...); L* raw = layer.get(); + + bool have_projection = false; + for (const auto& l : layers_) + { + have_projection = have_projection || l->is_projection_layer(); + } + if (raw->is_projection_layer() && !layers_.empty()) + { + throw std::invalid_argument("VizSession: a ProjectionLayer must be the session's only layer"); + } + if (!raw->is_projection_layer() && have_projection) + { + throw std::invalid_argument("VizSession: cannot add another layer alongside a ProjectionLayer"); + } + + // Let the layer reject a backend it can't run on (e.g. a direct-present + // layer whose resolution/stereo/slot count doesn't match). Throws + // before the layer is registered, so a rejected add leaves no state. + validate_layer_against_backend_(raw); + raw->attach_to_session_(this); layers_.push_back(std::move(layer)); return raw; @@ -182,6 +209,10 @@ class VizSession void init(); const VkContext& ctx() const noexcept; + // Ask a freshly-constructed layer to validate itself against the active + // backend (resolution / stereo / in-flight count). No-op until the + // backend is initialized. Throws std::invalid_argument on mismatch. + void validate_layer_against_backend_(LayerBase* layer) const; void update_timing_stats(float frame_time_seconds); // Poll backend events + handle resize. Called by render() and // begin_frame() so explicit-loop users get the same behavior. @@ -207,6 +238,11 @@ class VizSession bool first_frame_ = true; bool frame_in_progress_ = false; FrameInfo current_frame_info_{}; + // The backend-acquired frame for the in-progress begin/end pair. + // Acquired by begin_frame, consumed by end_frame. nullopt outside + // a begin/end window or when the backend skipped this frame + // (e.g. XR runtime shouldRender=0). + std::optional current_backend_frame_; FrameTimingStats timing_stats_{}; }; diff --git a/src/viz/session/cpp/inc/viz/session/window_backend.hpp b/src/viz/session/cpp/inc/viz/session/window_backend.hpp index ac61058e8..070c8a288 100644 --- a/src/viz/session/cpp/inc/viz/session/window_backend.hpp +++ b/src/viz/session/cpp/inc/viz/session/window_backend.hpp @@ -38,6 +38,16 @@ class WindowBackend final : public DisplayBackend std::optional begin_frame(int64_t predicted_display_time) override; const RenderTarget& render_target() const override; void record_post_render_pass(VkCommandBuffer cmd, const Frame& frame) override; + + // Direct-present: blit a ProjectionLayer's single color image straight + // to the window swapchain, skipping the shared RT. Depth is unused in + // window mode. Empty views → clear to black. + bool supports_direct() const noexcept override + { + return true; + } + void record_direct(VkCommandBuffer cmd, const Frame& frame, const std::vector& views) override; + void end_frame(const Frame& frame) override; void abort_frame(const Frame& frame) override; diff --git a/src/viz/session/cpp/inc/viz/session/xr_backend.hpp b/src/viz/session/cpp/inc/viz/session/xr_backend.hpp index 7e977820b..4a95a6c2e 100644 --- a/src/viz/session/cpp/inc/viz/session/xr_backend.hpp +++ b/src/viz/session/cpp/inc/viz/session/xr_backend.hpp @@ -67,6 +67,17 @@ class XrBackend final : public DisplayBackend void end_frame(const Frame& frame) override; void abort_frame(const Frame& frame) override; + // Direct-present: copy a ProjectionLayer's per-eye color/depth straight + // into the per-eye color + depth swapchains (vkCmdCopyImage, verbatim), + // skipping the shared RT — so the renderer's depth reaches CloudXR + // exactly. Per-eye recommended size keeps the copy 1:1. + bool supports_direct() const noexcept override + { + return true; + } + void record_direct(VkCommandBuffer cmd, const Frame& frame, const std::vector& views) override; + Resolution recommended_view_resolution() const override; + void poll_events() override; bool should_close() const override; Resolution current_extent() const override; @@ -116,6 +127,13 @@ class XrBackend final : public DisplayBackend void destroy_swapchains(); void create_intermediate(); + // Per-eye staging buffers that bridge a direct ProjectionLayer's depth + // (stored as R32_SFLOAT — CUDA can't interop a depth-format image) into + // the D32_SFLOAT depth swapchain via image->buffer->image. The float bits + // copy verbatim. Allocated only when depth submission is enabled. + void create_depth_staging(); + void destroy_depth_staging() noexcept; + // Release every swapchain currently flagged `acquired`. void release_acquired_swapchains() noexcept; // Submit an empty xrEndFrame to balance an outstanding xrBeginFrame. @@ -139,6 +157,15 @@ class XrBackend final : public DisplayBackend std::vector depth_swapchains_; bool depth_layer_enabled_ = false; + // Per-eye R32_SFLOAT->D32_SFLOAT bridge buffers (see create_depth_staging). + struct DepthStaging + { + VkBuffer buffer = VK_NULL_HANDLE; + VkDeviceMemory memory = VK_NULL_HANDLE; + VkDeviceSize size = 0; + }; + std::vector depth_staging_; + // Per-frame state — valid only while frame_began_ == true. XrFrameState last_frame_state_{ XR_TYPE_FRAME_STATE }; XrViewState last_view_state_{ XR_TYPE_VIEW_STATE }; diff --git a/src/viz/session/cpp/offscreen_backend.cpp b/src/viz/session/cpp/offscreen_backend.cpp index 0cb0d3b10..236dc6478 100644 --- a/src/viz/session/cpp/offscreen_backend.cpp +++ b/src/viz/session/cpp/offscreen_backend.cpp @@ -37,6 +37,30 @@ uint32_t find_memory_type(VkPhysicalDevice physical_device, uint32_t type_bits, throw std::runtime_error("OffscreenBackend: no memory type matches readback requirements"); } +void transition_image(VkCommandBuffer cmd, + VkImage image, + VkImageLayout old_layout, + VkImageLayout new_layout, + VkAccessFlags src_access, + VkAccessFlags dst_access, + VkPipelineStageFlags src_stage, + VkPipelineStageFlags dst_stage) +{ + VkImageMemoryBarrier b{}; + b.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER; + b.oldLayout = old_layout; + b.newLayout = new_layout; + b.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; + b.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; + b.image = image; + b.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + b.subresourceRange.levelCount = 1; + b.subresourceRange.layerCount = 1; + b.srcAccessMask = src_access; + b.dstAccessMask = dst_access; + vkCmdPipelineBarrier(cmd, src_stage, dst_stage, 0, 0, nullptr, 0, nullptr, 1, &b); +} + } // namespace OffscreenBackend::OffscreenBackend() = default; @@ -141,6 +165,58 @@ HostImage OffscreenBackend::readback_to_host() return result; } +void OffscreenBackend::record_direct(VkCommandBuffer cmd, + const Frame& /*frame*/, + const std::vector& views) +{ + if (render_target_ == nullptr) + { + return; + } + const VkImage dst = render_target_->color_image(); + const VkImage src = views.empty() ? VK_NULL_HANDLE : views[0].color; + + // Discard prior RT contents (UNDEFINED) — the copy/clear fully + // overwrites — and leave the RT in TRANSFER_SRC so readback_to_host's + // image→buffer copy works exactly as it does after the render pass. + transition_image(cmd, dst, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 0, + VK_ACCESS_TRANSFER_WRITE_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT); + + if (src != VK_NULL_HANDLE) + { + transition_image(cmd, src, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, + VK_ACCESS_SHADER_READ_BIT, VK_ACCESS_TRANSFER_READ_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, + VK_PIPELINE_STAGE_TRANSFER_BIT); + + VkImageCopy region{}; + region.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + region.srcSubresource.layerCount = 1; + region.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + region.dstSubresource.layerCount = 1; + // 1:1 copy — add_layer guarantees source extent == target extent. + region.extent = { extent_.width, extent_.height, 1 }; + vkCmdCopyImage( + cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion); + + transition_image(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, + VK_ACCESS_TRANSFER_READ_BIT, VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, + VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT); + } + else + { + VkClearColorValue clear{}; + VkImageSubresourceRange range{}; + range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + range.levelCount = 1; + range.layerCount = 1; + vkCmdClearColorImage(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear, 1, &range); + } + + transition_image(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, + VK_ACCESS_TRANSFER_WRITE_BIT, VK_ACCESS_TRANSFER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, + VK_PIPELINE_STAGE_TRANSFER_BIT); +} + void OffscreenBackend::create_readback_staging() { readback_byte_size_ = diff --git a/src/viz/session/cpp/swapchain.cpp b/src/viz/session/cpp/swapchain.cpp index 271579df6..887eefcdf 100644 --- a/src/viz/session/cpp/swapchain.cpp +++ b/src/viz/session/cpp/swapchain.cpp @@ -31,21 +31,23 @@ void check_vk(VkResult r, const char* what) } } -// Pick a surface format. Prefer B8G8R8A8_SRGB (common Linux default, -// matches our intermediate framebuffer's sRGB color space). Fall back -// to any *_SRGB format. Else accept whatever the runtime offers first. +// Pick a surface format. Prefer R8G8B8A8_SRGB so the direct-present path's +// vkCmdCopyImage (raw bytes, no channel reorder) from our RGBA layer images +// lands in the right channel order; a BGRA swapchain would swap R/B. Fall +// back to B8G8R8A8_SRGB (common Linux default — the composited path blits, so +// channel order is handled there), then any *_SRGB, then whatever's offered. VkSurfaceFormatKHR pick_surface_format(const std::vector& formats) { for (const auto& f : formats) { - if (f.format == VK_FORMAT_B8G8R8A8_SRGB && f.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) + if (f.format == VK_FORMAT_R8G8B8A8_SRGB && f.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) { return f; } } for (const auto& f : formats) { - if (f.format == VK_FORMAT_R8G8B8A8_SRGB && f.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) + if (f.format == VK_FORMAT_B8G8R8A8_SRGB && f.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) { return f; } diff --git a/src/viz/session/cpp/viz_compositor.cpp b/src/viz/session/cpp/viz_compositor.cpp index 4a73c73ec..9ac7095e1 100644 --- a/src/viz/session/cpp/viz_compositor.cpp +++ b/src/viz/session/cpp/viz_compositor.cpp @@ -191,7 +191,7 @@ void VizCompositor::submit_or_signal_fence(const VkSubmitInfo& info, const char* throw std::runtime_error(std::string("VizCompositor: ") + what + " failed: VkResult=" + std::to_string(r)); } -void VizCompositor::render(const std::vector& layers) +void VizCompositor::render(const DisplayBackend::Frame& frame, const std::vector& layers) { // Snapshot visible layers once — is_visible() is atomic, and // reading it twice could record a draw without the matching wait. @@ -205,19 +205,12 @@ void VizCompositor::render(const std::vector& layers) } } - auto frame = backend_->begin_frame(/*predicted_display_time=*/0); - if (!frame.has_value()) - { - // Backend skipped; all fences stay signaled, next wait() won't deadlock. - return; - } - // Catch swapchain recreates whose image_count differs from the one - // we sized per-slot state for. Runs AFTER begin_frame because - // WindowBackend::begin_frame may itself recreate (OUT_OF_DATE etc.). - // Wrapped so a failed rebuild balances the backend protocol — we've - // already acquired a swapchain image and FrameGuard isn't set up - // yet, so a raw throw would leak the acquire. + // we sized per-slot state for. Runs first because the backend's + // begin_frame (run by VizSession) may itself recreate (OUT_OF_DATE + // etc.). Wrapped so a failed rebuild balances the backend protocol — + // the frame is already acquired and FrameGuard isn't set up yet, so + // a raw throw would leak it. try { ensure_slot_count_matches_backend(); @@ -226,7 +219,7 @@ void VizCompositor::render(const std::vector& layers) { try { - backend_->abort_frame(*frame); + backend_->abort_frame(frame); } catch (...) { @@ -242,14 +235,14 @@ void VizCompositor::render(const std::vector& layers) // threw; reaching here means logic drift. Bail rather than UB. try { - backend_->abort_frame(*frame); + backend_->abort_frame(frame); } catch (...) { } throw std::runtime_error("VizCompositor: slot_count == 0 after ensure_slot_count_matches_backend"); } - const uint32_t slot = static_cast(frame->backend_token) % slot_count; + const uint32_t slot = static_cast(frame.backend_token) % slot_count; FrameSync& slot_sync = *frame_syncs_[slot]; VkCommandBuffer command_buffer = command_buffers_[slot]; @@ -299,18 +292,24 @@ void VizCompositor::render(const std::vector& layers) } } } - } frame_guard{ backend_, &*frame }; + } frame_guard{ backend_, &frame }; const RenderTarget& rt = backend_->render_target(); const Resolution rt_extent = rt.resolution(); - // XR: per-eye viewports come from frame->views. tile layout is + // XR: per-eye viewports come from frame.views. tile layout is // window/offscreen letterboxing only. const bool xr_mode = backend_->is_xr(); - // Per-layer aspect-fit tiles (window/offscreen only). + // Direct-present: a single ProjectionLayer copied straight to the + // swapchain (no shared render pass). VizSession's add_layer guarantees + // a direct layer is the session's only layer, so size()==1 suffices. + const bool direct_mode = + visible_layers.size() == 1 && visible_layers[0]->supports_direct_present() && backend_->supports_direct(); + + // Per-layer aspect-fit tiles (window/offscreen composited path only). std::vector tiles; - if (!xr_mode && !visible_layers.empty()) + if (!xr_mode && !direct_mode && !visible_layers.empty()) { const float fb_aspect = static_cast(rt_extent.width) / static_cast(rt_extent.height); std::vector aspects; @@ -335,68 +334,97 @@ void VizCompositor::render(const std::vector& layers) vkCmdWriteTimestamp(command_buffer, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 0); } - // Pre-pass hook for transfer/compute work that can't run inside a - // render pass (e.g. QuadLayer mip-chain blits). Ordering: all - // layers' pre-pass run BEFORE any record(), so a layer can rely on - // its own pre-pass results inside record(). - for (LayerBase* layer : visible_layers) + if (direct_mode) { - layer->record_pre_render_pass(command_buffer, slot); - } + // Promote the layer's latest published (color, depth) for this slot + // and copy straight into the swapchain — no render pass, no shared + // RT. acquire_direct_views must precede the wait-semaphore gather + // below so get_wait_semaphores reflects the promoted slot. + const std::vector direct_views = visible_layers[0]->acquire_direct_views(slot); - std::array clears{}; - clears[0].color = config_.clear_color; - clears[1].depthStencil = { 1.0f, 0 }; - - VkRenderPassBeginInfo rp{}; - rp.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO; - rp.renderPass = rt.render_pass(); - rp.framebuffer = rt.framebuffer(); - rp.renderArea.offset = { 0, 0 }; - rp.renderArea.extent = { rt_extent.width, rt_extent.height }; - rp.clearValueCount = static_cast(clears.size()); - rp.pClearValues = clears.data(); + // ts1: nothing rendered into the RT; mark the point for symmetry. + if (gpu_timestamp_pool_ != VK_NULL_HANDLE) + { + vkCmdWriteTimestamp( + command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 1); + } - vkCmdBeginRenderPass(command_buffer, &rp, VK_SUBPASS_CONTENTS_INLINE); + backend_->record_direct(command_buffer, frame, direct_views); - // Window/offscreen: scissor=tile.outer + view[0].viewport=tile.content - // for aspect-fit letterboxing. - if (xr_mode) - { - const VkRect2D rt_full{ { 0, 0 }, { rt_extent.width, rt_extent.height } }; - vkCmdSetScissor(command_buffer, 0, 1, &rt_full); + // ts2: end of direct copy (ts2-ts1 = copy/transition cost). + if (gpu_timestamp_pool_ != VK_NULL_HANDLE) + { + vkCmdWriteTimestamp( + command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 2); + } } - for (size_t i = 0; i < visible_layers.size(); ++i) + else { - std::vector layer_views = frame->views; - if (layer_views.empty()) + // Pre-pass hook for transfer/compute work that can't run inside a + // render pass (e.g. QuadLayer mip-chain blits). Ordering: all + // layers' pre-pass run BEFORE any record(), so a layer can rely on + // its own pre-pass results inside record(). + for (LayerBase* layer : visible_layers) + { + layer->record_pre_render_pass(command_buffer, slot); + } + + std::array clears{}; + clears[0].color = config_.clear_color; + clears[1].depthStencil = { 1.0f, 0 }; + + VkRenderPassBeginInfo rp{}; + rp.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO; + rp.renderPass = rt.render_pass(); + rp.framebuffer = rt.framebuffer(); + rp.renderArea.offset = { 0, 0 }; + rp.renderArea.extent = { rt_extent.width, rt_extent.height }; + rp.clearValueCount = static_cast(clears.size()); + rp.pClearValues = clears.data(); + + vkCmdBeginRenderPass(command_buffer, &rp, VK_SUBPASS_CONTENTS_INLINE); + + // Window/offscreen: scissor=tile.outer + view[0].viewport=tile.content + // for aspect-fit letterboxing. + if (xr_mode) { - layer_views.push_back(ViewInfo{}); + const VkRect2D rt_full{ { 0, 0 }, { rt_extent.width, rt_extent.height } }; + vkCmdSetScissor(command_buffer, 0, 1, &rt_full); } - if (!xr_mode) + for (size_t i = 0; i < visible_layers.size(); ++i) { - const VkRect2D scissor_rect = tiles[i].outer; - const VkRect2D viewport_rect = tiles[i].content; - vkCmdSetScissor(command_buffer, 0, 1, &scissor_rect); - layer_views[0].viewport = to_rect2d(viewport_rect); + std::vector layer_views = frame.views; + if (layer_views.empty()) + { + layer_views.push_back(ViewInfo{}); + } + if (!xr_mode) + { + const VkRect2D scissor_rect = tiles[i].outer; + const VkRect2D viewport_rect = tiles[i].content; + vkCmdSetScissor(command_buffer, 0, 1, &scissor_rect); + layer_views[0].viewport = to_rect2d(viewport_rect); + } + visible_layers[i]->record(command_buffer, layer_views, rt, slot); } - visible_layers[i]->record(command_buffer, layer_views, rt, slot); - } - vkCmdEndRenderPass(command_buffer); + vkCmdEndRenderPass(command_buffer); - // ts1: end of render pass. - if (gpu_timestamp_pool_ != VK_NULL_HANDLE) - { - vkCmdWriteTimestamp(command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 1); - } + // ts1: end of render pass. + if (gpu_timestamp_pool_ != VK_NULL_HANDLE) + { + vkCmdWriteTimestamp( + command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 1); + } - backend_->record_post_render_pass(command_buffer, *frame); + backend_->record_post_render_pass(command_buffer, frame); - // ts2: end of backend post-pass (ts2-ts1 = blit/transition cost). - if (gpu_timestamp_pool_ != VK_NULL_HANDLE) - { - vkCmdWriteTimestamp(command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 2); + // ts2: end of backend post-pass (ts2-ts1 = blit/transition cost). + if (gpu_timestamp_pool_ != VK_NULL_HANDLE) + { + vkCmdWriteTimestamp( + command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 2); + } } // ts3: cmd-buffer-end (total = ts3-ts0). @@ -423,18 +451,18 @@ void VizCompositor::render(const std::vector& layers) } } } - if (frame->wait_before_render != VK_NULL_HANDLE) + if (frame.wait_before_render != VK_NULL_HANDLE) { - wait_semaphores.push_back(frame->wait_before_render); + wait_semaphores.push_back(frame.wait_before_render); wait_values.push_back(0); - wait_stages.push_back(frame->wait_stage); + wait_stages.push_back(frame.wait_stage); } std::vector signal_semaphores; std::vector signal_values; - if (frame->signal_after_render != VK_NULL_HANDLE) + if (frame.signal_after_render != VK_NULL_HANDLE) { - signal_semaphores.push_back(frame->signal_after_render); + signal_semaphores.push_back(frame.signal_after_render); signal_values.push_back(0); } @@ -463,9 +491,15 @@ void VizCompositor::render(const std::vector& layers) submit_or_signal_fence(submit, "vkQueueSubmit", slot_sync.in_flight_fence()); cmd_guard.released = true; - // No trailing host wait — the slot's fence is gated by the next - // render that targets this slot. GPU timing forces a synchronous - // wait to read query results; opt-in only. + // No trailing host wait: the slot's fence gates the NEXT render into this + // slot (cmd buffer + depth staging reuse), and OpenXR guarantees the + // runtime reads the swapchain only after our queue-submitted writes + // complete (queue ordering — the same VkQueue we bound to the session). + // Host-waiting here would block xrBeginFrame..xrEndFrame for a full GPU + // frame, pushing xrEndFrame past predictedDisplayTime and forcing the + // runtime to reproject every frame (visible tearing/judder under motion). + // holohub's xr_gsplat pipelines the same way. The gpu_timing path still + // waits — it has to read back the timestamp queries. if (gpu_timestamp_pool_ != VK_NULL_HANDLE) { slot_sync.wait(); @@ -488,7 +522,7 @@ void VizCompositor::render(const std::vector& layers) } } - backend_->end_frame(*frame); + backend_->end_frame(frame); frame_guard.released = true; } diff --git a/src/viz/session/cpp/viz_session.cpp b/src/viz/session/cpp/viz_session.cpp index d4796e8a8..36840c31b 100644 --- a/src/viz/session/cpp/viz_session.cpp +++ b/src/viz/session/cpp/viz_session.cpp @@ -211,11 +211,47 @@ FrameInfo VizSession::begin_frame() last_frame_time_ = now; current_frame_info_.frame_index = frame_index_; - current_frame_info_.predicted_display_time = 0; // XR-only; 0 in offscreen - current_frame_info_.should_render = (state_ == SessionState::kRunning); current_frame_info_.resolution = compositor_ ? compositor_->resolution() : Resolution{}; - // Identity placeholder; backends fill per-view info inside render(). - current_frame_info_.views.assign(1, ViewInfo{}); + current_backend_frame_.reset(); + + // Acquire the backend frame BEFORE returning so renderers calling + // submit() against the returned FrameInfo's views are working with + // the same per-eye poses xrEndFrame will submit later. Skip the + // acquire when state isn't kRunning (kStopping/kLost paths submit + // empty xrEndFrames internally) or when the backend itself returns + // nullopt (XR shouldRender=0, swapchain skip). + if (state_ == SessionState::kRunning && backend_) + { + current_backend_frame_ = backend_->begin_frame(/*ignored=*/0); + } + + if (current_backend_frame_.has_value()) + { + current_frame_info_.should_render = true; + current_frame_info_.predicted_display_time = current_backend_frame_->predicted_display_time_ns; + current_frame_info_.views = current_backend_frame_->views; + if (current_frame_info_.views.empty()) + { + current_frame_info_.views.assign(1, ViewInfo{}); + } + } + else + { + current_frame_info_.should_render = false; + current_frame_info_.predicted_display_time = 0; + current_frame_info_.views.assign(1, ViewInfo{}); + } + + // Notify layers a new frame has begun. Lets ProjectionLayer-style + // layers clear per-frame freshness flags so a stale mailbox slot + // doesn't get composited under a new pose. + for (const auto& layer : layers_) + { + if (layer != nullptr) + { + layer->on_frame_begin(); + } + } frame_in_progress_ = true; return current_frame_info_; @@ -227,31 +263,27 @@ void VizSession::end_frame() { throw std::logic_error("VizSession: end_frame called without a matching begin_frame"); } - if (state_ != SessionState::kRunning) - { - frame_in_progress_ = false; - return; - } struct ClearGuard { bool* flag; + std::optional* frame_slot; ~ClearGuard() { *flag = false; + frame_slot->reset(); } - } guard{ &frame_in_progress_ }; - - std::vector raw_layers; - raw_layers.reserve(layers_.size()); - for (const auto& l : layers_) - { - raw_layers.push_back(l.get()); - } + } guard{ &frame_in_progress_, ¤t_backend_frame_ }; - if (current_frame_info_.should_render) + if (current_backend_frame_.has_value()) { - compositor_->render(raw_layers); + std::vector raw_layers; + raw_layers.reserve(layers_.size()); + for (const auto& l : layers_) + { + raw_layers.push_back(l.get()); + } + compositor_->render(*current_backend_frame_, raw_layers); } update_timing_stats(current_frame_info_.delta_time); @@ -278,11 +310,34 @@ void VizSession::update_timing_stats(float frame_time_seconds) (timing_stats_.avg_frame_time_ms > 0.0f) ? 1000.0f / timing_stats_.avg_frame_time_ms : 0.0f; } +void VizSession::validate_layer_against_backend_(LayerBase* layer) const +{ + // No backend yet (layers may be built standalone in tests) → nothing to check. + if (layer == nullptr || backend_ == nullptr) + { + return; + } + // Context affinity: a foreign context's images/semaphores would be used on + // this session's queue — invalid cross-device usage. + if (layer->vk_context() != nullptr && layer->vk_context() != &ctx()) + { + throw std::invalid_argument( + "VizSession: layer was created from a different VkContext than the session's; " + "build layers with VizSession::get_vk_context()."); + } + const uint32_t backend_view_count = backend_->is_xr() ? 2u : 1u; + layer->validate_backend_compatibility( + backend_->recommended_view_resolution(), backend_view_count, backend_->image_count()); +} + Resolution VizSession::get_recommended_resolution() const noexcept { - if (compositor_) + // Per-view size a direct ProjectionLayer should render at (kXr: per-eye, + // so its copy to the swapchain is 1:1). Window/offscreen report the + // single-view target extent. + if (backend_) { - return compositor_->resolution(); + return backend_->recommended_view_resolution(); } return Resolution{ config_.window_width, config_.window_height }; } diff --git a/src/viz/session/cpp/window_backend.cpp b/src/viz/session/cpp/window_backend.cpp index 10b62af2f..e6fe6ab5a 100644 --- a/src/viz/session/cpp/window_backend.cpp +++ b/src/viz/session/cpp/window_backend.cpp @@ -205,6 +205,68 @@ void WindowBackend::record_post_render_pass(VkCommandBuffer cmd, const Frame& fr VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT); } +void WindowBackend::record_direct(VkCommandBuffer cmd, const Frame& frame, const std::vector& views) +{ + if (swapchain_ == nullptr || render_target_ == nullptr) + { + return; + } + // Fill the intermediate RT (R8G8B8A8_SRGB) with the layer's color, then + // reuse the normal RT->swapchain blit. That blit decodes the sRGB RT to + // linear and re-encodes to the swapchain format — which both round-trips + // the gsplat sRGB bytes correctly AND reorders channels for a BGRA surface + // AND scales. A direct copy to the swapchain can't reorder channels (wrong + // RGB on BGRA surfaces) and a direct blit double-encodes sRGB. + const VkImage rt = render_target_->color_image(); + const Resolution rt_extent = render_target_->resolution(); + const VkImage src = views.empty() ? VK_NULL_HANDLE : views[0].color; + + transition_image(cmd, rt, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 0, + VK_ACCESS_TRANSFER_WRITE_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT); + + if (src != VK_NULL_HANDLE) + { + // Raw copy: source UNORM bytes (already sRGB-encoded) land verbatim in + // the SRGB RT. Sizes match (layer renders at the RT/window size). + transition_image(cmd, src, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, + VK_ACCESS_SHADER_READ_BIT, VK_ACCESS_TRANSFER_READ_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, + VK_PIPELINE_STAGE_TRANSFER_BIT); + + const Resolution se = views[0].extent; + const uint32_t cw = std::min(se.width, rt_extent.width); + const uint32_t ch = std::min(se.height, rt_extent.height); + VkImageCopy region{}; + region.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + region.srcSubresource.layerCount = 1; + region.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + region.dstSubresource.layerCount = 1; + region.extent = { cw, ch, 1 }; + vkCmdCopyImage( + cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, rt, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion); + + transition_image(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, + VK_ACCESS_TRANSFER_READ_BIT, VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, + VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT); + } + else + { + VkClearColorValue clear{}; + VkImageSubresourceRange range{}; + range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + range.levelCount = 1; + range.layerCount = 1; + vkCmdClearColorImage(cmd, rt, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear, 1, &range); + } + + // Leave the RT in TRANSFER_SRC (the layout record_post_render_pass expects + // after the render pass), then run the standard RT->swapchain blit. + transition_image(cmd, rt, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, + VK_ACCESS_TRANSFER_WRITE_BIT, VK_ACCESS_TRANSFER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, + VK_PIPELINE_STAGE_TRANSFER_BIT); + + record_post_render_pass(cmd, frame); +} + void WindowBackend::end_frame(const Frame& frame) { if (swapchain_ == nullptr) diff --git a/src/viz/session/cpp/xr_backend.cpp b/src/viz/session/cpp/xr_backend.cpp index 42dd68c7c..9f8efe70b 100644 --- a/src/viz/session/cpp/xr_backend.cpp +++ b/src/viz/session/cpp/xr_backend.cpp @@ -28,6 +28,28 @@ void check_xr(XrResult r, const char* what) } } +void check_vk(VkResult r, const char* what) +{ + if (r != VK_SUCCESS) + { + throw std::runtime_error(std::string("XrBackend: ") + what + " failed: VkResult=" + std::to_string(r)); + } +} + +uint32_t find_memory_type(VkPhysicalDevice physical_device, uint32_t type_bits, VkMemoryPropertyFlags properties) +{ + VkPhysicalDeviceMemoryProperties mem_props; + vkGetPhysicalDeviceMemoryProperties(physical_device, &mem_props); + for (uint32_t i = 0; i < mem_props.memoryTypeCount; ++i) + { + if ((type_bits & (1u << i)) != 0 && (mem_props.memoryTypes[i].propertyFlags & properties) == properties) + { + return i; + } + } + throw std::runtime_error("XrBackend: no memory type matches depth staging requirements"); +} + void transition_image(VkCommandBuffer cmd, VkImage image, VkImageLayout old_layout, @@ -104,6 +126,7 @@ void XrBackend::init(const VkContext& ctx, Resolution /*preferred_size*/) if (depth_layer_enabled_) { create_depth_swapchains(); + create_depth_staging(); } create_intermediate(); } @@ -123,6 +146,7 @@ void XrBackend::destroy() // Order: rendering resources → session. Runtime owns swapchain // images, so xrDestroySwapchain is enough (no vkDestroyImage). render_target_.reset(); + destroy_depth_staging(); destroy_swapchains(); session_.reset(); ctx_ = nullptr; @@ -282,6 +306,58 @@ void XrBackend::create_depth_swapchains() } } +void XrBackend::create_depth_staging() +{ + const VkDevice device = ctx_->device(); + depth_staging_.assign(depth_swapchains_.size(), DepthStaging{}); + for (size_t i = 0; i < depth_swapchains_.size(); ++i) + { + const VkDeviceSize size = static_cast(depth_swapchains_[i].width) * depth_swapchains_[i].height * 4; + + VkBufferCreateInfo bi{}; + bi.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO; + bi.size = size; + bi.usage = VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT; + bi.sharingMode = VK_SHARING_MODE_EXCLUSIVE; + check_vk(vkCreateBuffer(device, &bi, nullptr, &depth_staging_[i].buffer), "vkCreateBuffer(depth staging)"); + + VkMemoryRequirements reqs; + vkGetBufferMemoryRequirements(device, depth_staging_[i].buffer, &reqs); + + VkMemoryAllocateInfo ai{}; + ai.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO; + ai.allocationSize = reqs.size; + ai.memoryTypeIndex = + find_memory_type(ctx_->physical_device(), reqs.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT); + check_vk(vkAllocateMemory(device, &ai, nullptr, &depth_staging_[i].memory), "vkAllocateMemory(depth staging)"); + check_vk(vkBindBufferMemory(device, depth_staging_[i].buffer, depth_staging_[i].memory, 0), + "vkBindBufferMemory(depth staging)"); + depth_staging_[i].size = size; + } +} + +void XrBackend::destroy_depth_staging() noexcept +{ + if (ctx_ == nullptr) + { + return; + } + const VkDevice device = ctx_->device(); + for (auto& s : depth_staging_) + { + if (s.buffer != VK_NULL_HANDLE) + { + vkDestroyBuffer(device, s.buffer, nullptr); + } + if (s.memory != VK_NULL_HANDLE) + { + vkFreeMemory(device, s.memory, nullptr); + } + s = DepthStaging{}; + } + depth_staging_.clear(); +} + void XrBackend::create_swapchains() { const auto& views = session_->view_configuration_views(); @@ -492,6 +568,10 @@ std::optional XrBackend::begin_frame(int64_t /*ignored*/) // invariant holds if image_count ever grows past 1. const uint32_t slots = image_count(); f.backend_token = (slots == 0) ? 0u : (frame_index_++ % slots); + // Predicted display time forwarded to FrameInfo so renderers can + // use it for time-aware content (e.g. animation timestamps that + // line up with the runtime's prediction). + f.predicted_display_time_ns = static_cast(last_frame_state_.predictedDisplayTime); // Hand protocol-balance off to the compositor's FrameGuard. in_flight_guard.dismissed = true; return f; @@ -606,14 +686,173 @@ void XrBackend::record_post_render_pass(VkCommandBuffer cmd, const Frame& frame) vkCmdCopyImage(cmd, depth_src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion); - // DEPTH_STENCIL_READ_ONLY is the conventional "runtime - // samples this" layout for the depth-info subImage. + // CloudXR follows the same handoff convention as Holohub's + // XrSwapchainCuda: depth swapchain images are returned in + // DEPTH_STENCIL_ATTACHMENT_OPTIMAL before xrEndFrame. Leaving + // them in READ_ONLY can make the runtime consume stale/invalid + // depth even though the XrCompositionLayerDepthInfoKHR metadata + // is otherwise correct. transition_image_aspect(cmd, dst, VK_IMAGE_ASPECT_DEPTH_BIT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, - VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL, VK_ACCESS_TRANSFER_WRITE_BIT, - VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, - VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT); + VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, VK_ACCESS_TRANSFER_WRITE_BIT, 0, + VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT); + } + } +} + +void XrBackend::record_direct(VkCommandBuffer cmd, const Frame& /*frame*/, const std::vector& views) +{ + if (!frame_renderable_) + { + return; + } + + // Direct-present: copy a ProjectionLayer's per-eye (color, depth) + // straight into the per-eye swapchains — no wide intermediate, no + // blit. recommended_view_resolution() sized the layer's images to the + // per-eye swapchain, so each copy is 1:1 and the renderer's depth + // reaches CloudXR byte-for-byte (matching Holohub's xr_gsplat path). + for (size_t i = 0; i < view_swapchains_.size(); ++i) + { + const auto& sw = view_swapchains_[i]; + const VkImage dst = sw.images[sw.current_image_index]; + const VkImage src = (i < views.size()) ? views[i].color : VK_NULL_HANDLE; + + transition_image(cmd, dst, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 0, + VK_ACCESS_TRANSFER_WRITE_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT); + + if (src != VK_NULL_HANDLE) + { + // Source DeviceImage rests in SHADER_READ_ONLY_OPTIMAL; move it + // to TRANSFER_SRC for the copy, then restore so the next frame's + // copy sees the same resting layout. + transition_image(cmd, src, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, + VK_ACCESS_SHADER_READ_BIT, VK_ACCESS_TRANSFER_READ_BIT, + VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT); + + // Verbatim copy (size-compatible UNORM ↔ SRGB) — no color-space + // conversion, unlike a blit. The runtime treats the swapchain + // format's encoding, so the raw bytes pass through unchanged. + VkImageCopy region{}; + region.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + region.srcSubresource.layerCount = 1; + region.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + region.dstSubresource.layerCount = 1; + // 1:1 copy — add_layer guarantees source extent == per-eye swapchain size. + region.extent = { sw.width, sw.height, 1 }; + vkCmdCopyImage( + cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ®ion); + + transition_image(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, + VK_ACCESS_TRANSFER_READ_BIT, VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, + VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT); + } + else + { + // No image submitted this frame: clear so the runtime never + // composites stale swapchain contents. + VkClearColorValue clear{}; + VkImageSubresourceRange range{}; + range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + range.levelCount = 1; + range.layerCount = 1; + vkCmdClearColorImage(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear, 1, &range); + } + + // OpenXR expects COLOR_ATTACHMENT_OPTIMAL at xrEndFrame. + transition_image(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, + VK_ACCESS_TRANSFER_WRITE_BIT, VK_ACCESS_COLOR_ATTACHMENT_READ_BIT, + VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT); + } + + if (!depth_layer_enabled_) + { + return; + } + + // Per-eye depth into XR_KHR_composition_layer_depth. The layer's depth + // image is R32_SFLOAT (color) — CUDA can't interop a depth-format image — + // so we bridge through a staging buffer: image(R32_SFLOAT,color) -> buffer + // -> image(D32_SFLOAT,depth). The 32-bit float bits copy verbatim. + for (size_t i = 0; i < depth_swapchains_.size(); ++i) + { + const auto& sw = depth_swapchains_[i]; + const VkImage dst = sw.images[sw.current_image_index]; + const VkImage src = (i < views.size()) ? views[i].depth : VK_NULL_HANDLE; + + transition_image_aspect(cmd, dst, VK_IMAGE_ASPECT_DEPTH_BIT, VK_IMAGE_LAYOUT_UNDEFINED, + VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 0, VK_ACCESS_TRANSFER_WRITE_BIT, + VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT); + + if (src != VK_NULL_HANDLE && i < depth_staging_.size()) + { + const VkBuffer staging = depth_staging_[i].buffer; + + // R32_SFLOAT source rests in SHADER_READ_ONLY (COLOR aspect). + transition_image(cmd, src, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, + VK_ACCESS_SHADER_READ_BIT, VK_ACCESS_TRANSFER_READ_BIT, + VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT); + + // 1:1 copy — add_layer guarantees source extent == per-eye swapchain size. + VkBufferImageCopy to_buf{}; + to_buf.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; + to_buf.imageSubresource.layerCount = 1; + to_buf.imageExtent = { sw.width, sw.height, 1 }; + vkCmdCopyImageToBuffer(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, staging, 1, &to_buf); + + // Order the buffer write before the buffer read. + VkBufferMemoryBarrier bb{}; + bb.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER; + bb.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT; + bb.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT; + bb.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; + bb.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; + bb.buffer = staging; + bb.offset = 0; + bb.size = VK_WHOLE_SIZE; + vkCmdPipelineBarrier( + cmd, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 1, &bb, 0, nullptr); + + VkBufferImageCopy to_img{}; + to_img.imageSubresource.aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT; + to_img.imageSubresource.layerCount = 1; + to_img.imageExtent = { sw.width, sw.height, 1 }; + vkCmdCopyBufferToImage(cmd, staging, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &to_img); + + transition_image(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, + VK_ACCESS_TRANSFER_READ_BIT, VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, + VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT); + } + else + { + // Clear to the far plane (1.0) so reprojection treats the empty + // frame as "nothing in front". + VkClearDepthStencilValue clear{ 1.0f, 0 }; + VkImageSubresourceRange range{}; + range.aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT; + range.levelCount = 1; + range.layerCount = 1; + vkCmdClearDepthStencilImage(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear, 1, &range); + } + + // Match record_post_render_pass: depth swapchains are returned in + // DEPTH_STENCIL_ATTACHMENT_OPTIMAL before xrEndFrame. + transition_image_aspect(cmd, dst, VK_IMAGE_ASPECT_DEPTH_BIT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, + VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, VK_ACCESS_TRANSFER_WRITE_BIT, 0, + VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT); + } +} + +Resolution XrBackend::recommended_view_resolution() const +{ + if (session_) + { + const auto& views = session_->view_configuration_views(); + if (!views.empty()) + { + return Resolution{ views[0].recommendedImageRectWidth, views[0].recommendedImageRectHeight }; } } + return current_extent(); } void XrBackend::end_frame(const Frame& /*frame*/) diff --git a/src/viz/session_tests/cpp/test_viz_session.cpp b/src/viz/session_tests/cpp/test_viz_session.cpp index 037262ba4..adf3b3288 100644 --- a/src/viz/session_tests/cpp/test_viz_session.cpp +++ b/src/viz/session_tests/cpp/test_viz_session.cpp @@ -4,15 +4,24 @@ // VizSession config + lifecycle tests. #include +#include +#include +#include +#include #include #include #include #include +using Catch::Matchers::ContainsSubstring; using viz::DisplayMode; +using viz::ProjectionLayer; +using viz::QuadLayer; +using viz::Resolution; using viz::SessionState; using viz::VizSession; +using viz::VkContext; using viz::testing::is_gpu_available; TEST_CASE("VizSession::create rejects zero window dimensions", "[unit][viz_session]") @@ -75,3 +84,90 @@ TEST_CASE("VizSession::create kXr fails on hosts without an OpenXR runtime", "[u cfg_xr.mode = DisplayMode::kXr; CHECK_THROWS_AS(VizSession::create(cfg_xr), std::runtime_error); } + +// ── add_layer invariant / affinity rejection (failure-path) ────────── + +namespace +{ +VizSession::Config offscreen_cfg(uint32_t side = 64) +{ + VizSession::Config cfg{}; + cfg.mode = DisplayMode::kOffscreen; + cfg.window_width = side; + cfg.window_height = side; + return cfg; +} +} // namespace + +TEST_CASE("VizSession rejects a ProjectionLayer sized off the recommended resolution", "[gpu][viz_session]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + auto session = VizSession::create(offscreen_cfg()); + REQUIRE(session != nullptr); + + const Resolution rec = session->get_recommended_resolution(); + ProjectionLayer::Config pcfg; + pcfg.view_resolution = { rec.width + 16, rec.height }; // deliberately wrong + // Built from the session's own context, so only the resolution check fires. + CHECK_THROWS_WITH( + session->add_layer(*session->get_vk_context(), pcfg), ContainsSubstring("view_resolution")); +} + +TEST_CASE("VizSession enforces single ProjectionLayer XOR QuadLayers", "[gpu][viz_session]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + const Resolution rec{ 64, 64 }; // offscreen recommended == window extent + + SECTION("ProjectionLayer rejected when a QuadLayer is already present") + { + auto session = VizSession::create(offscreen_cfg()); + QuadLayer::Config qcfg; + qcfg.name = "quad"; + qcfg.resolution = { 64, 64 }; + REQUIRE(session->add_layer(*session->get_vk_context(), session->get_render_pass(), qcfg) != nullptr); + + ProjectionLayer::Config pcfg; + pcfg.view_resolution = rec; + CHECK_THROWS_WITH( + session->add_layer(*session->get_vk_context(), pcfg), ContainsSubstring("only layer")); + } + + SECTION("QuadLayer rejected when a ProjectionLayer is already present") + { + auto session = VizSession::create(offscreen_cfg()); + ProjectionLayer::Config pcfg; + pcfg.view_resolution = rec; + REQUIRE(session->add_layer(*session->get_vk_context(), pcfg) != nullptr); + + QuadLayer::Config qcfg; + qcfg.name = "quad"; + qcfg.resolution = { 64, 64 }; + CHECK_THROWS_WITH(session->add_layer(*session->get_vk_context(), session->get_render_pass(), qcfg), + ContainsSubstring("alongside a ProjectionLayer")); + } +} + +TEST_CASE("VizSession rejects a layer built from a foreign VkContext", "[gpu][viz_session]") +{ + if (!is_gpu_available()) + { + SKIP("No Vulkan-capable GPU available"); + } + auto session = VizSession::create(offscreen_cfg()); + REQUIRE(session != nullptr); + + // A second, independent context/device — its images + semaphores would be + // used on the session's queue, which is invalid cross-device usage. + VkContext foreign; + foreign.init({}); + + ProjectionLayer::Config pcfg; + pcfg.view_resolution = session->get_recommended_resolution(); // correct size, wrong context + CHECK_THROWS_WITH(session->add_layer(foreign, pcfg), ContainsSubstring("different VkContext")); +} diff --git a/src/viz/shaders/cpp/CMakeLists.txt b/src/viz/shaders/cpp/CMakeLists.txt index 0f934aaf7..0d641b3f9 100644 --- a/src/viz/shaders/cpp/CMakeLists.txt +++ b/src/viz/shaders/cpp/CMakeLists.txt @@ -45,10 +45,12 @@ function(compile_shader GLSL_PATH VAR_NAME) endfunction() # Shader programs: -# textured_quad — fullscreen quad sampling a combined image sampler. -# Used by QuadLayer to display a CUDA-fed texture. -compile_shader(textured_quad.vert kTexturedQuadVertSpv) -compile_shader(textured_quad.frag kTexturedQuadFragSpv) +# textured_quad — fullscreen / placed quad sampling a combined image +# sampler. Used by QuadLayer. +# (ProjectionLayer is direct-present-only — it copies its color/depth +# images straight to the swapchains, so it owns no shaders.) +compile_shader(textured_quad.vert kTexturedQuadVertSpv) +compile_shader(textured_quad.frag kTexturedQuadFragSpv) # INTERFACE library exposing the generated headers + a phony custom # target that ensures the headers exist before any consumer compiles.