diff --git a/docs/source/getting_started/televiz.rst b/docs/source/getting_started/televiz.rst
index 95080b014..99fbe4477 100644
--- a/docs/source/getting_started/televiz.rst
+++ b/docs/source/getting_started/televiz.rst
@@ -5,8 +5,9 @@ Televiz
 =======
 
 Televiz (``isaacteleop.viz``) is a lightweight compositor for Isaac Teleop. It composites camera and
-sensor feeds — with 3D rendered content coming soon — into an XR headset, a desktop window, or an
-offscreen buffer, integrating directly with the device-tracking and retargeting pipeline.
+sensor feeds, plus 3D rendered content (gsplat, nvblox, neural reconstruction), into an XR headset, a
+desktop window, or an offscreen buffer, integrating directly with the device-tracking and retargeting
+pipeline.
 
 It is a **compositor**, not a capture or streaming layer: it consumes GPU frames and assembles them
 into a final image. Camera capture, decode, and network transport live in the application (see
@@ -29,16 +30,17 @@ which owns the Vulkan context, the display target, the OpenXR session (in XR mod
 of **layers**. Content producers submit GPU buffers to layers; the session composites every layer
 into one frame each time you call ``render()``.
 
-The built-in layer type today is
-:code-file:`QuadLayer <src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp>` — a CUDA-fed 2D texture
-plane (mono or stereo), optionally placed in 3D space. Use it for camera feeds.
+Two layer types are available:
 
-.. note::
+* :code-file:`QuadLayer <src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp>` — a CUDA-fed 2D texture
+  plane (mono or stereo), optionally placed in 3D space. Use it for camera feeds.
+* :code-file:`ProjectionLayer <src/viz/layers/cpp/inc/viz/layers/projection_layer.hpp>` — a full-view
+  RGBD layer for external renderers (gsplat, nvblox, neural reconstruction) that produce per-view
+  ``(color, depth)`` buffers. Use it to present a rendered 3D scene from the current head pose.
 
-   **Coming soon:** ``ProjectionLayer``, a full-view stereo RGBD layer for external renderers
-   (gsplat, nvblox, neural reconstruction) that produce per-view ``(color, depth)`` buffers,
-   Z-composited with quads. It is not yet available in this release — see `ProjectionLayer
-   (coming soon)`_ below.
+A session holds **either** one ``ProjectionLayer`` **or** any number of ``QuadLayer`` s, not both:
+quads composite into a shared render target, while a projection layer is presented directly (see
+`ProjectionLayer`_).
 
 All symbols are imported from the top-level module::
 
@@ -212,19 +214,63 @@ For a stereo layer both buffers are copied on the same stream and signaled toget
 never sees a half-matched pair. Lock-mode placement strategies (``world`` / ``head`` / ``lazy``) are
 **application policy** and ship in the sample, not in the module.
 
-ProjectionLayer (coming soon)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ProjectionLayer
+^^^^^^^^^^^^^^^
 
-.. note::
+A full-view RGBD layer for **in-loop** renderers — gsplat, nvblox, or neural reconstruction engines
+that produce per-view ``(color, depth)`` buffers. Configure it with ``ProjectionLayerConfig``:
 
-   ``ProjectionLayer`` is under active development and **not yet available in this release**. The
-   description below is a preview of the planned API and may change.
+.. list-table::
+   :header-rows: 1
+   :widths: 25 75
+
+   * - Field
+     - Description
+   * - ``name``
+     - Layer name.
+   * - ``view_resolution``
+     - Per-view render resolution. **Must equal** ``session.get_recommended_resolution()`` — the
+       layer's images are copied 1:1 into the presentation swapchains (per-eye in XR). A mismatch
+       is rejected by ``add_projection_layer``.
+   * - ``color_format``
+     - ``PixelFormat.kRGBA8``.
+   * - ``depth_format``
+     - ``PixelFormat.kD32F`` (default) so the depth reaches the XR runtime for positional
+       reprojection, or ``None`` to present color only.
+   * - ``stereo``
+     - ``True`` for per-eye buffers. A stereo (XR) display **requires** a stereo layer; a mono layer
+       is rejected at ``add_projection_layer``.
+
+Unlike ``QuadLayer``, a projection layer is **direct-present**: each view's ``(color, depth)`` is
+copied straight into the presentation swapchains (no shared render target). Because of that a session
+holds *either* one ``ProjectionLayer`` *or* any number of ``QuadLayer`` s, never both.
+
+The renderer runs **in-loop** with the frame loop: read the predicted view poses from the
+``FrameInfo`` returned by ``begin_frame()``, render against them, then ``submit()`` before
+``end_frame()``:
+
+.. code-block:: python
+
+   cfg = televiz.ProjectionLayerConfig()
+   cfg.view_resolution = session.get_recommended_resolution()
+   cfg.stereo = session.is_xr_mode()
+   layer = session.add_projection_layer(cfg)
+
+   while running:
+       info = session.begin_frame()
+       if info.should_render:
+           # Render against THIS frame's per-eye poses (info.views[i].pose + .fov).
+           color, depth = renderer.render(info.views)        # RGBA8 + D32F CUDA buffers
+           if layer.stereo:
+               layer.submit(left_color, left_depth, right_color, right_depth, stream=cuda_stream)
+           else:
+               layer.submit(color, depth, stream=cuda_stream)
+       session.end_frame()
 
-A planned full-view RGBD layer for in-loop renderers — gsplat, nvblox, or neural reconstruction
-engines that produce per-view ``(color, depth)`` buffers. Unlike ``QuadLayer``, the renderer will
-run **in-loop** with the XR frame loop: render against the predicted view poses from the current
-frame, then submit between ``begin_frame()`` and ``end_frame()``. Output is composited with depth,
-so it Z-combines with quad layers.
+If the renderer is slower than display rate, the runtime / CloudXR paces the app via ``xrWaitFrame``
+and reprojects the last submitted frame at display rate. In XR, a visible layer that does **not**
+submit for a frame presents nothing (the swapchains are cleared) rather than reproject stale RGBD
+under a new pose.
 
 Frame loop
 ----------
diff --git a/docs/source/overview/architecture.rst b/docs/source/overview/architecture.rst
index 8efcb5f54..ed78526a8 100644
--- a/docs/source/overview/architecture.rst
+++ b/docs/source/overview/architecture.rst
@@ -43,7 +43,7 @@ Visualization (Televiz)
 Televiz (``isaacteleop.viz``) is a lightweight compositor module for visualizing what the operator
 sees — camera and sensor feeds, plus 3D rendered content — in an XR headset or a desktop window.
 
-- Composites multiple sources into one view: 2D camera/sensor planes (``QuadLayer``) today, with full-view stereo RGBD (``ProjectionLayer``) for 3D rendered content coming soon
+- Composites multiple sources into one view: 2D camera/sensor planes (``QuadLayer``) and full-view stereo RGBD (``ProjectionLayer``) for 3D rendered content
 - Per-eye stereo rendering and 3D placement in XR; the same API drives windowed and offscreen output
 - Zero-copy submission of GPU frames straight from CuPy, PyTorch, or any CUDA memory object
 - Shares one OpenXR session with the teleop device trackers, so rendering and tracking can run over a single CloudXR connection
diff --git a/src/viz/core/cpp/device_image.cpp b/src/viz/core/cpp/device_image.cpp
index 0fbabc2c3..e64a09583 100644
--- a/src/viz/core/cpp/device_image.cpp
+++ b/src/viz/core/cpp/device_image.cpp
@@ -84,7 +84,14 @@ VkFormat to_vk_storage_format(PixelFormat format)
     case PixelFormat::kRGBA8:
         return VK_FORMAT_R8G8B8A8_UNORM;
     case PixelFormat::kD32F:
-        return VK_FORMAT_D32_SFLOAT;
+        // Single-channel float COLOR format, NOT VK_FORMAT_D32_SFLOAT. Depth
+        // formats use hardware depth compression in optimal tiling that CUDA
+        // external-memory array interop cannot interpret, so a CUDA-written
+        // D32_SFLOAT image reads back as garbage on the Vulkan side. R32_SFLOAT
+        // is bit-identical (IEEE float32) and interops exactly like the color
+        // images do; the bridge into the D32_SFLOAT XR depth swapchain happens
+        // via a staging buffer in the backend (float bits copy verbatim).
+        return VK_FORMAT_R32_SFLOAT;
     }
     throw std::runtime_error("DeviceImage: unsupported PixelFormat");
 }
@@ -96,7 +103,7 @@ VkFormat to_vk_view_format(PixelFormat format)
     case PixelFormat::kRGBA8:
         return VK_FORMAT_R8G8B8A8_SRGB;
     case PixelFormat::kD32F:
-        return VK_FORMAT_D32_SFLOAT;
+        return VK_FORMAT_R32_SFLOAT; // see to_vk_storage_format
     }
     throw std::runtime_error("DeviceImage: unsupported PixelFormat");
 }
@@ -128,13 +135,17 @@ std::unique_ptr<DeviceImage> DeviceImage::create(const VkContext& ctx,
     {
         throw std::invalid_argument("DeviceImage: resolution must be non-zero");
     }
-    if (format != PixelFormat::kRGBA8)
+    if (format != PixelFormat::kRGBA8 && format != PixelFormat::kD32F)
     {
-        // kD32F is reserved for ProjectionLayer's depth path. The
-        // CUDA-Vulkan interop contract for a depth image (sample
-        // semantics, layout transitions, color-space view) is not
-        // worked out yet, so refuse to half-build it.
-        throw std::invalid_argument("DeviceImage: only PixelFormat::kRGBA8 is supported");
+        throw std::invalid_argument("DeviceImage: unsupported PixelFormat");
+    }
+    if (format == PixelFormat::kD32F && mip_levels > 1)
+    {
+        // Depth + mip chain is meaningless (filtering depth between mip
+        // levels produces incorrect occlusion) and we'd have to
+        // special-case the blit-down pipeline. Reject explicitly rather
+        // than silently allocating the chain.
+        throw std::invalid_argument("DeviceImage: kD32F does not support mip_levels > 1");
     }
     // mip_levels == 0 -> auto-compute full chain to 1x1.
     if (mip_levels == 0)
@@ -335,8 +346,9 @@ void DeviceImage::create_vk_image_view()
     info.image = image_;
     info.viewType = VK_IMAGE_VIEW_TYPE_2D;
     info.format = vk_format_;
-    info.subresourceRange.aspectMask =
-        (format_ == PixelFormat::kD32F) ? VK_IMAGE_ASPECT_DEPTH_BIT : VK_IMAGE_ASPECT_COLOR_BIT;
+    // Always COLOR: kD32F is stored as R32_SFLOAT (a color format), not a
+    // depth format — see to_vk_storage_format.
+    info.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
     info.subresourceRange.baseMipLevel = 0;
     info.subresourceRange.levelCount = mip_levels_;
     info.subresourceRange.baseArrayLayer = 0;
@@ -517,8 +529,8 @@ void DeviceImage::run_one_shot_layout_transition(VkImageLayout old_layout,
     barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
     barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
     barrier.image = image_;
-    barrier.subresourceRange.aspectMask =
-        (format_ == PixelFormat::kD32F) ? VK_IMAGE_ASPECT_DEPTH_BIT : VK_IMAGE_ASPECT_COLOR_BIT;
+    // kD32F is stored as R32_SFLOAT (color format), so always COLOR aspect.
+    barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
     barrier.subresourceRange.baseMipLevel = 0;
     barrier.subresourceRange.levelCount = mip_levels_;
     barrier.subresourceRange.baseArrayLayer = 0;
diff --git a/src/viz/layers/cpp/CMakeLists.txt b/src/viz/layers/cpp/CMakeLists.txt
index 02c92d88f..e48096162 100644
--- a/src/viz/layers/cpp/CMakeLists.txt
+++ b/src/viz/layers/cpp/CMakeLists.txt
@@ -10,7 +10,9 @@ cmake_minimum_required(VERSION 3.20)
 # viz/layers_tests/.
 add_library(viz_layers STATIC
     quad_layer.cpp
+    projection_layer.cpp
     inc/viz/layers/quad_layer.hpp
+    inc/viz/layers/projection_layer.hpp
 )
 
 target_include_directories(viz_layers
diff --git a/src/viz/layers/cpp/inc/viz/layers/projection_layer.hpp b/src/viz/layers/cpp/inc/viz/layers/projection_layer.hpp
new file mode 100644
index 000000000..35f217f90
--- /dev/null
+++ b/src/viz/layers/cpp/inc/viz/layers/projection_layer.hpp
@@ -0,0 +1,215 @@
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+
+#pragma once
+
+#include <viz/core/device_image.hpp>
+#include <viz/core/viz_buffer.hpp>
+#include <viz/core/viz_types.hpp>
+#include <viz/session/layer_base.hpp>
+#include <vulkan/vulkan.h>
+
+#include <array>
+#include <atomic>
+#include <cstdint>
+#include <cuda_runtime.h>
+#include <memory>
+#include <optional>
+#include <string>
+#include <vector>
+
+namespace viz
+{
+
+class VkContext;
+
+// ProjectionLayer: a full-view RGBD layer for renderers (gsplat, nvblox,
+// neural reconstruction) that produce (color, depth) buffers per frame.
+//
+// DIRECT-PRESENT ONLY. Unlike QuadLayer, this layer is never composited
+// into the shared render target. Its per-eye (color, depth) images are
+// copied STRAIGHT into the presentation swapchains by the backend
+// (vkCmdCopyImage, verbatim) — exactly like holohub xr_gsplat. In kXr
+// that means the renderer's depth lands in the XR depth swapchain with no
+// gl_FragDepth round-trip, so CloudXR positional reprojection gets exact
+// depth. Consequently a VizSession holds EITHER one ProjectionLayer OR
+// any number of QuadLayers, never both (enforced by VizSession). Because
+// the copy is 1:1, ``view_resolution`` MUST equal the swapchain per-view
+// size (use VizSession::get_recommended_resolution()).
+//
+// Frame loop contract — IMPORTANT:
+//
+//     info = session.begin_frame()                    // xrLocateViews
+//     color, depth = renderer.render(info.views)      // render against THIS frame's views
+//     layer.submit(color, depth)                      // publish for THIS frame
+//     session.end_frame()                             // copy to swapchain + xrEndFrame
+//
+// ``submit()`` MUST be called between ``begin_frame()`` and
+// ``end_frame()``. The renderer MUST render against ``info.views[i].pose``
+// (the predicted-display-time pose for this frame). The runtime / CloudXR
+// paces the app via xrWaitFrame and reprojects the last submitted frame
+// at display rate if the renderer is slower.
+//
+// In ``kXr``, a visible ProjectionLayer that does NOT receive a
+// ``submit()`` for the current frame presents nothing (the backend clears
+// the swapchains) rather than hand the runtime yesterday's RGBD under
+// today's pose. In ``kWindow`` / ``kOffscreen`` the freshness gate is off
+// — the most recent publish stays on screen until replaced.
+//
+// Mailbox: kSlotCount per-eye (color, depth) DeviceImage pairs. submit()
+// picks a slot that's neither ``latest_`` nor in any ``in_use_`` entry,
+// memcpys + signals cuda_done_writing on the caller's stream, blocks on
+// cudaStreamSynchronize so the caller can re-use source buffers
+// immediately, then atomically promotes the slot to ``latest_``.
+// acquire_direct_views() promotes ``latest_`` to ``in_use_[slot]`` and
+// returns that slot's images for the backend copy.
+//
+// Stereo: when Config::stereo is true, the layer allocates paired
+// (left, right) storage per slot. submit() must ship both eyes on a
+// single CUDA stream; stream ordering keeps the pair atomic. In kXr
+// view 0 (left eye) → left buffer, view 1 (right eye) → right.
+//
+// Memory (per layer):
+//   mono   1024² RGBA8+D32F: 7 slots × 1024² × 8 B    ≈  56 MB
+//   stereo 1024² RGBA8+D32F:                          ≈ 112 MB
+//   stereo 2048² RGBA8+D32F:                          ≈ 448 MB
+class ProjectionLayer : public LayerBase
+{
+public:
+    // Sized to cover backend image counts up to 5, leave one free slot.
+    static constexpr uint32_t kMaxFramesInFlight = 5;
+    static constexpr uint32_t kSlotCount = kMaxFramesInFlight + 2;
+
+    struct Config
+    {
+        std::string name = "ProjectionLayer";
+        Resolution view_resolution{};
+        PixelFormat color_format = PixelFormat::kRGBA8;
+
+        // nullopt → no depth buffer allocated; ProjectionLayer always
+        // writes gl_FragDepth = 1.0 (far). Without depth, this layer
+        // loses Z-compositing with QuadLayer. Useful for renderers that
+        // genuinely have no depth (sky / background fills).
+        std::optional<PixelFormat> depth_format = PixelFormat::kD32F;
+
+        // true → per-eye paired storage. submit MUST ship both eyes.
+        // In kWindow / kOffscreen the LEFT buffer is sampled; in kXr
+        // view 0 → LEFT, view 1 → RIGHT.
+        bool stereo = false;
+    };
+
+    ProjectionLayer(const VkContext& ctx, Config config);
+    ~ProjectionLayer() override;
+    void destroy();
+
+    ProjectionLayer(const ProjectionLayer&) = delete;
+    ProjectionLayer& operator=(const ProjectionLayer&) = delete;
+
+    // Publish a frame. Each buffer is a CUDA-linear VizBuffer (kDevice
+    // space) matching the layer's resolution and the relevant format
+    // (color → color_format, depth → kD32F). Validated against the
+    // config; mismatch throws std::invalid_argument.
+    //
+    // Mono no-depth:           submit(color)
+    // Mono with depth:         submit(color, &depth)
+    // Stereo no-depth:         submit(left_color, nullptr, &right_color, nullptr)
+    // Stereo with depth:       submit(left_color, &left_depth, &right_color, &right_depth)
+    //
+    // submit() does one cudaMemcpy2DToArrayAsync per provided buffer
+    // on ``stream``, signals cuda_done_writing on the same stream, then
+    // BLOCKS on cudaStreamSynchronize so the caller can re-use source
+    // buffers immediately. Cost: ~0.5 ms / 1024² color + depth on a
+    // discrete GPU.
+    //
+    // Marks the layer "fresh for this frame" so record() will draw it.
+    // VizSession::begin_frame clears the flag at the start of each
+    // frame; a renderer that doesn't submit will see its content
+    // skipped in kXr.
+    //
+    // GIL: pybind binding releases the GIL across this whole call.
+    void submit(const VizBuffer& left_color,
+                const VizBuffer* left_depth = nullptr,
+                const VizBuffer* right_color = nullptr,
+                const VizBuffer* right_depth = nullptr,
+                cudaStream_t stream = 0);
+
+    // LayerBase contract.
+    void on_frame_begin() override; // clears submitted_this_frame_ flag
+
+    // Direct-present-only: never drawn into the shared render pass, so
+    // record() is a no-op. The compositor always takes the direct path
+    // (acquire_direct_views) for this layer.
+    void record(VkCommandBuffer /*cmd*/,
+                const std::vector<ViewInfo>& /*views*/,
+                const RenderTarget& /*target*/,
+                uint32_t /*in_flight_slot*/) override
+    {
+    }
+
+    // cuda_done_writing waits (TRANSFER stage — the backend copies these
+    // images) for color + depth of every active view in the in-use slot.
+    // kSlotNone → empty vector.
+    std::vector<LayerBase::WaitSemaphore> get_wait_semaphores() const override;
+
+    bool is_projection_layer() const noexcept override
+    {
+        return true;
+    }
+    bool supports_direct_present() const noexcept override
+    {
+        return true;
+    }
+    const VkContext* vk_context() const noexcept override
+    {
+        return ctx_;
+    }
+    std::vector<DirectPresentView> acquire_direct_views(uint32_t in_flight_slot) override;
+
+    // Direct-present requires a 1:1 swapchain copy, so view_resolution must
+    // equal the backend's per-view size, a stereo display needs a stereo
+    // (>= view-count) layer, and the backend's in-flight image count must fit
+    // the mailbox. Throws std::invalid_argument on any mismatch.
+    void validate_backend_compatibility(Resolution recommended_view_resolution,
+                                        uint32_t backend_view_count,
+                                        uint32_t backend_image_count) const override;
+
+    // Accessors.
+    Resolution view_resolution() const noexcept;
+    PixelFormat color_format() const noexcept;
+    std::optional<PixelFormat> depth_format() const noexcept;
+    bool is_stereo() const noexcept;
+    uint32_t view_count() const noexcept;
+
+    // Diagnostic — nullptr outside valid ranges.
+    const DeviceImage* color_image(uint32_t slot, uint32_t view) const noexcept;
+    const DeviceImage* depth_image(uint32_t slot, uint32_t view) const noexcept;
+
+private:
+    static constexpr uint8_t kSlotNone = 0xFF;
+
+    void init();
+    uint8_t pick_free_slot() const noexcept;
+    void validate_submit_buffer(const VizBuffer& buf, PixelFormat expected_format, const char* label) const;
+    void enqueue_copy(const VizBuffer& src, DeviceImage& dst, cudaStream_t stream) const;
+
+    const VkContext* ctx_ = nullptr;
+    Config config_;
+    uint32_t view_count_ = 1;
+    bool has_depth_ = true;
+
+    // Per-eye (color, depth) mailbox storage. CUDA-mapped DeviceImages the
+    // backend copies straight to the swapchains.
+    std::array<std::vector<std::unique_ptr<DeviceImage>>, kSlotCount> slots_color_;
+    std::array<std::vector<std::unique_ptr<DeviceImage>>, kSlotCount> slots_depth_;
+
+    // Mailbox.
+    std::atomic<uint8_t> latest_{ kSlotNone };
+    std::array<std::atomic<uint8_t>, kMaxFramesInFlight> in_use_{};
+    std::atomic<uint8_t> last_in_use_slot_{ kSlotNone };
+
+    // Set by submit(), cleared by on_frame_begin(). acquire_direct_views()
+    // consults this in kXr to gate stale-RGBD-under-new-pose presents.
+    std::atomic<bool> submitted_this_frame_{ false };
+};
+
+} // namespace viz
diff --git a/src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp b/src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp
index 41254162f..8d51668ef 100644
--- a/src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp
+++ b/src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp
@@ -192,6 +192,11 @@ class QuadLayer : public LayerBase
     // Drives aspect-fit letterbox in window mode; ignored in kXr.
     std::optional<float> aspect_ratio() const noexcept override;
 
+    const VkContext* vk_context() const noexcept override
+    {
+        return ctx_;
+    }
+
     Resolution resolution() const noexcept;
     PixelFormat format() const noexcept;
 
diff --git a/src/viz/layers/cpp/projection_layer.cpp b/src/viz/layers/cpp/projection_layer.cpp
new file mode 100644
index 000000000..a1cc52407
--- /dev/null
+++ b/src/viz/layers/cpp/projection_layer.cpp
@@ -0,0 +1,437 @@
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+
+#include <viz/core/vk_context.hpp>
+#include <viz/layers/projection_layer.hpp>
+#include <viz/session/viz_session.hpp>
+
+#include <stdexcept>
+#include <string>
+#include <utility>
+
+namespace viz
+{
+
+namespace
+{
+
+void check_cuda(cudaError_t result, const char* what)
+{
+    if (result != cudaSuccess)
+    {
+        throw std::runtime_error(std::string("ProjectionLayer: ") + what + " failed: " + cudaGetErrorString(result));
+    }
+}
+
+} // namespace
+
+ProjectionLayer::ProjectionLayer(const VkContext& ctx, Config config)
+    : LayerBase(config.name), ctx_(&ctx), config_(std::move(config))
+{
+    // Validate config first (cheap, no resources), then the context.
+    // Config checks don't depend on ctx, and ordering them first lets them
+    // be unit-tested without a GPU-initialized VkContext.
+    if (config_.view_resolution.width == 0 || config_.view_resolution.height == 0)
+    {
+        throw std::invalid_argument("ProjectionLayer: view_resolution must be non-zero");
+    }
+    if (config_.color_format != PixelFormat::kRGBA8)
+    {
+        throw std::invalid_argument("ProjectionLayer: color_format must be kRGBA8");
+    }
+    if (config_.depth_format.has_value() && config_.depth_format.value() != PixelFormat::kD32F)
+    {
+        throw std::invalid_argument("ProjectionLayer: depth_format must be kD32F or nullopt");
+    }
+    if (!ctx.is_initialized())
+    {
+        throw std::invalid_argument("ProjectionLayer: VkContext not initialized");
+    }
+    view_count_ = config_.stereo ? 2u : 1u;
+    has_depth_ = config_.depth_format.has_value();
+    for (auto& slot : in_use_)
+    {
+        slot.store(kSlotNone, std::memory_order_relaxed);
+    }
+    init();
+}
+
+ProjectionLayer::~ProjectionLayer()
+{
+    destroy();
+}
+
+void ProjectionLayer::init()
+{
+    // The only resources this layer owns are the per-eye (color, depth)
+    // mailbox DeviceImages. There is no render pipeline — the backend
+    // copies these images straight to the swapchains.
+    try
+    {
+        for (uint32_t s = 0; s < kSlotCount; ++s)
+        {
+            slots_color_[s].reserve(view_count_);
+            for (uint32_t v = 0; v < view_count_; ++v)
+            {
+                slots_color_[s].push_back(DeviceImage::create(*ctx_, config_.view_resolution, config_.color_format, 1));
+            }
+            if (has_depth_)
+            {
+                slots_depth_[s].reserve(view_count_);
+                for (uint32_t v = 0; v < view_count_; ++v)
+                {
+                    slots_depth_[s].push_back(
+                        DeviceImage::create(*ctx_, config_.view_resolution, *config_.depth_format, 1));
+                }
+            }
+        }
+    }
+    catch (...)
+    {
+        destroy();
+        throw;
+    }
+}
+
+void ProjectionLayer::destroy()
+{
+    // Drain pending GPU work before freeing the images the compositor's
+    // command buffers reference.
+    if (ctx_ != nullptr && ctx_->device() != VK_NULL_HANDLE)
+    {
+        (void)vkDeviceWaitIdle(ctx_->device());
+    }
+    for (uint32_t s = 0; s < kSlotCount; ++s)
+    {
+        slots_color_[s].clear();
+        slots_depth_[s].clear();
+    }
+}
+
+// ─── Submit ──────────────────────────────────────────────────────────
+
+void ProjectionLayer::validate_submit_buffer(const VizBuffer& buf, PixelFormat expected_format, const char* label) const
+{
+    if (buf.data == nullptr)
+    {
+        throw std::invalid_argument(std::string("ProjectionLayer: ") + label + ": data is null");
+    }
+    if (buf.space != MemorySpace::kDevice)
+    {
+        throw std::invalid_argument(std::string("ProjectionLayer: ") + label + ": MemorySpace must be kDevice");
+    }
+    if (buf.format != expected_format)
+    {
+        throw std::invalid_argument(std::string("ProjectionLayer: ") + label + ": pixel format mismatch");
+    }
+    if (buf.width != config_.view_resolution.width || buf.height != config_.view_resolution.height)
+    {
+        throw std::invalid_argument(std::string("ProjectionLayer: ") + label + ": resolution mismatch");
+    }
+}
+
+void ProjectionLayer::enqueue_copy(const VizBuffer& src, DeviceImage& dst, cudaStream_t stream) const
+{
+    const size_t row_bytes = static_cast<size_t>(src.width) * bytes_per_pixel(src.format);
+    const size_t src_pitch = (src.pitch != 0) ? src.pitch : row_bytes;
+    check_cuda(cudaMemcpy2DToArrayAsync(dst.cuda_array(),
+                                        /*wOffset=*/0,
+                                        /*hOffset=*/0, src.data, src_pitch, row_bytes, src.height,
+                                        cudaMemcpyDeviceToDevice, stream),
+               "cudaMemcpy2DToArrayAsync");
+}
+
+uint8_t ProjectionLayer::pick_free_slot() const noexcept
+{
+    const uint8_t latest = latest_.load(std::memory_order_acquire);
+    for (uint8_t s = 0; s < static_cast<uint8_t>(kSlotCount); ++s)
+    {
+        if (s == latest)
+        {
+            continue;
+        }
+        bool used = false;
+        for (const auto& a : in_use_)
+        {
+            if (a.load(std::memory_order_acquire) == s)
+            {
+                used = true;
+                break;
+            }
+        }
+        if (!used)
+        {
+            return s;
+        }
+    }
+    return kSlotNone;
+}
+
+void ProjectionLayer::submit(const VizBuffer& left_color,
+                             const VizBuffer* left_depth,
+                             const VizBuffer* right_color,
+                             const VizBuffer* right_depth,
+                             cudaStream_t stream)
+{
+    // ── Validate config / call shape ─────────────────────────────────
+    validate_submit_buffer(left_color, config_.color_format, "submit(left_color)");
+
+    const bool stereo = config_.stereo;
+    if (stereo)
+    {
+        if (right_color == nullptr)
+        {
+            throw std::invalid_argument("ProjectionLayer: stereo layer requires right_color");
+        }
+        validate_submit_buffer(*right_color, config_.color_format, "submit(right_color)");
+    }
+    else
+    {
+        if (right_color != nullptr || right_depth != nullptr)
+        {
+            throw std::invalid_argument("ProjectionLayer: mono layer must not pass right buffers");
+        }
+    }
+
+    if (has_depth_)
+    {
+        if (left_depth == nullptr)
+        {
+            throw std::invalid_argument("ProjectionLayer: depth-enabled layer requires left_depth");
+        }
+        validate_submit_buffer(*left_depth, PixelFormat::kD32F, "submit(left_depth)");
+        if (stereo)
+        {
+            if (right_depth == nullptr)
+            {
+                throw std::invalid_argument("ProjectionLayer: stereo + depth requires right_depth");
+            }
+            validate_submit_buffer(*right_depth, PixelFormat::kD32F, "submit(right_depth)");
+        }
+    }
+    else
+    {
+        if (left_depth != nullptr || right_depth != nullptr)
+        {
+            throw std::invalid_argument("ProjectionLayer: depth-disabled layer must not pass depth buffers");
+        }
+    }
+
+    // ── Pick a free slot ─────────────────────────────────────────────
+    const uint8_t slot = pick_free_slot();
+    if (slot == kSlotNone)
+    {
+        // Should be unreachable given the kSlotCount invariant
+        // (kMaxFramesInFlight + 2 ≥ worst-case forbidden set + 1).
+        throw std::runtime_error("ProjectionLayer: no free mailbox slot — sizing invariant violated");
+    }
+
+    // ── Copy + signal ────────────────────────────────────────────────
+    enqueue_copy(left_color, *slots_color_[slot][0], stream);
+    if (has_depth_)
+    {
+        enqueue_copy(*left_depth, *slots_depth_[slot][0], stream);
+    }
+    if (stereo)
+    {
+        enqueue_copy(*right_color, *slots_color_[slot][1], stream);
+        if (has_depth_)
+        {
+            enqueue_copy(*right_depth, *slots_depth_[slot][1], stream);
+        }
+    }
+
+    // One semaphore signal per CUDA-mapped image we wrote. The compositor
+    // waits on the in-use slot's set of cuda_done_writing values before
+    // the backend copies them (get_wait_semaphores).
+    slots_color_[slot][0]->cuda_signal_write_done(stream);
+    if (has_depth_)
+    {
+        slots_depth_[slot][0]->cuda_signal_write_done(stream);
+    }
+    if (stereo)
+    {
+        slots_color_[slot][1]->cuda_signal_write_done(stream);
+        if (has_depth_)
+        {
+            slots_depth_[slot][1]->cuda_signal_write_done(stream);
+        }
+    }
+
+    // BLOCK on stream completion so the caller can re-use src buffers
+    // immediately. Same contract as QuadLayer::submit. ~sub-ms cost.
+    check_cuda(cudaStreamSynchronize(stream), "cudaStreamSynchronize");
+
+    latest_.store(slot, std::memory_order_release);
+    submitted_this_frame_.store(true, std::memory_order_release);
+}
+
+// ─── Frame state / direct present ────────────────────────────────────
+
+void ProjectionLayer::on_frame_begin()
+{
+    // VizSession's begin_frame calls this on every layer. Clearing the
+    // flag here means a layer that fails to submit between begin_frame and
+    // end_frame is skipped (kXr) at acquire_direct_views() time.
+    submitted_this_frame_.store(false, std::memory_order_release);
+}
+
+std::vector<LayerBase::WaitSemaphore> ProjectionLayer::get_wait_semaphores() const
+{
+    std::vector<WaitSemaphore> waits;
+    const uint8_t cur = last_in_use_slot_.load(std::memory_order_acquire);
+    if (cur == kSlotNone)
+    {
+        return waits;
+    }
+    const auto add = [&](const DeviceImage& img)
+    {
+        const uint64_t value = img.cuda_done_writing_value();
+        if (value == 0)
+        {
+            return;
+        }
+        WaitSemaphore w{};
+        w.semaphore = img.cuda_done_writing();
+        w.value = value;
+        // The backend copies these images (vkCmdCopyImage), so gate the
+        // CUDA-done wait at the transfer stage, not the fragment stage.
+        w.wait_stage = VK_PIPELINE_STAGE_TRANSFER_BIT;
+        waits.push_back(w);
+    };
+    for (uint32_t v = 0; v < view_count_; ++v)
+    {
+        if (slots_color_[cur].size() > v && slots_color_[cur][v])
+        {
+            add(*slots_color_[cur][v]);
+        }
+        if (has_depth_ && slots_depth_[cur].size() > v && slots_depth_[cur][v])
+        {
+            add(*slots_depth_[cur][v]);
+        }
+    }
+    return waits;
+}
+
+std::vector<DirectPresentView> ProjectionLayer::acquire_direct_views(uint32_t in_flight_slot)
+{
+    if (in_flight_slot >= kMaxFramesInFlight)
+    {
+        throw std::logic_error("ProjectionLayer: in_flight_slot exceeds kMaxFramesInFlight");
+    }
+
+    const bool xr_mode = session() != nullptr && session()->is_xr_mode();
+    const uint8_t latest = latest_.load(std::memory_order_acquire);
+
+    // Freshness gate: in kXr never hand the runtime stale RGBD under this
+    // frame's pose; nothing-published is the same skip. On skip leave no
+    // in-use slot so get_wait_semaphores() is empty and the backend clears
+    // the swapchains.
+    const bool skip = latest == kSlotNone || (xr_mode && !submitted_this_frame_.load(std::memory_order_acquire));
+    if (skip)
+    {
+        in_use_[in_flight_slot].store(kSlotNone, std::memory_order_release);
+        last_in_use_slot_.store(kSlotNone, std::memory_order_release);
+        return {};
+    }
+
+    // Promote the latest publish so get_wait_semaphores() waits on its
+    // cuda_done_writing before the backend copies it.
+    in_use_[in_flight_slot].store(latest, std::memory_order_release);
+    last_in_use_slot_.store(latest, std::memory_order_release);
+
+    std::vector<DirectPresentView> views;
+    views.reserve(view_count_);
+    for (uint32_t v = 0; v < view_count_; ++v)
+    {
+        DirectPresentView dv{};
+        dv.extent = config_.view_resolution;
+        if (slots_color_[latest].size() > v && slots_color_[latest][v])
+        {
+            dv.color = slots_color_[latest][v]->vk_image();
+        }
+        if (has_depth_ && slots_depth_[latest].size() > v && slots_depth_[latest][v])
+        {
+            dv.depth = slots_depth_[latest][v]->vk_image();
+        }
+        views.push_back(dv);
+    }
+    return views;
+}
+
+// ─── Accessors ───────────────────────────────────────────────────────
+
+void ProjectionLayer::validate_backend_compatibility(Resolution recommended_view_resolution,
+                                                     uint32_t backend_view_count,
+                                                     uint32_t backend_image_count) const
+{
+    // Images are copied 1:1 into the swapchain, so sizes must match.
+    if (config_.view_resolution.width != recommended_view_resolution.width ||
+        config_.view_resolution.height != recommended_view_resolution.height)
+    {
+        throw std::invalid_argument(
+            "ProjectionLayer: view_resolution (" + std::to_string(config_.view_resolution.width) + "x" +
+            std::to_string(config_.view_resolution.height) + ") must equal the display's recommended per-view size (" +
+            std::to_string(recommended_view_resolution.width) + "x" + std::to_string(recommended_view_resolution.height) +
+            "); use VizSession::get_recommended_resolution() to size the layer.");
+    }
+    // A stereo display needs >= that many views, else an eye is blank (a
+    // stereo layer on a mono display is fine — the left eye is used).
+    if (view_count_ < backend_view_count)
+    {
+        throw std::invalid_argument("ProjectionLayer: a mono layer cannot drive a " + std::to_string(backend_view_count) +
+                                    "-view (stereo) display; construct with Config::stereo = true.");
+    }
+    // The in-use slot is the backend's in-flight image index; fail at attach.
+    if (backend_image_count > kMaxFramesInFlight)
+    {
+        throw std::invalid_argument(
+            "ProjectionLayer: backend cycles " + std::to_string(backend_image_count) +
+            " in-flight images, exceeding kMaxFramesInFlight=" + std::to_string(kMaxFramesInFlight) + ".");
+    }
+}
+
+Resolution ProjectionLayer::view_resolution() const noexcept
+{
+    return config_.view_resolution;
+}
+
+PixelFormat ProjectionLayer::color_format() const noexcept
+{
+    return config_.color_format;
+}
+
+std::optional<PixelFormat> ProjectionLayer::depth_format() const noexcept
+{
+    return config_.depth_format;
+}
+
+bool ProjectionLayer::is_stereo() const noexcept
+{
+    return config_.stereo;
+}
+
+uint32_t ProjectionLayer::view_count() const noexcept
+{
+    return view_count_;
+}
+
+const DeviceImage* ProjectionLayer::color_image(uint32_t slot, uint32_t view) const noexcept
+{
+    if (slot >= kSlotCount || view >= view_count_ || slots_color_[slot].size() <= view)
+    {
+        return nullptr;
+    }
+    return slots_color_[slot][view].get();
+}
+
+const DeviceImage* ProjectionLayer::depth_image(uint32_t slot, uint32_t view) const noexcept
+{
+    if (!has_depth_ || slot >= kSlotCount || view >= view_count_ || slots_depth_[slot].size() <= view)
+    {
+        return nullptr;
+    }
+    return slots_depth_[slot][view].get();
+}
+
+} // namespace viz
diff --git a/src/viz/layers_tests/cpp/CMakeLists.txt b/src/viz/layers_tests/cpp/CMakeLists.txt
index 1eb371a92..a0ba72133 100644
--- a/src/viz/layers_tests/cpp/CMakeLists.txt
+++ b/src/viz/layers_tests/cpp/CMakeLists.txt
@@ -7,6 +7,7 @@ cmake_minimum_required(VERSION 3.20)
 # the production layers in viz::layers (QuadLayer, ...).
 add_executable(viz_layers_tests
     test_clear_rect_layer.cpp
+    test_projection_layer.cpp
     test_quad_layer.cpp
     test_throwing_layer.cpp
 )
diff --git a/src/viz/layers_tests/cpp/test_projection_layer.cpp b/src/viz/layers_tests/cpp/test_projection_layer.cpp
new file mode 100644
index 000000000..c13329ab8
--- /dev/null
+++ b/src/viz/layers_tests/cpp/test_projection_layer.cpp
@@ -0,0 +1,497 @@
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+
+// Tests for ProjectionLayer: config validation (unit-level) and the
+// CUDA-Vulkan interop mailbox + submit (gpu-level). ProjectionLayer is
+// direct-present-only (no render pipeline); end-to-end copy-to-swapchain
+// + readback lives in viz_session_tests where the full backend exists.
+
+#include "test_helpers.hpp"
+
+#include <catch2/catch_test_macros.hpp>
+#include <catch2/matchers/catch_matchers_string.hpp>
+#include <viz/core/viz_buffer.hpp>
+#include <viz/core/vk_context.hpp>
+#include <viz/layers/projection_layer.hpp>
+
+#include <cstdint>
+#include <cuda_runtime.h>
+#include <stdexcept>
+
+using viz::DeviceImage;
+using viz::PixelFormat;
+using viz::ProjectionLayer;
+using viz::VizBuffer;
+using viz::VkContext;
+
+using viz::testing::is_gpu_available;
+
+namespace
+{
+
+struct CudaFreeGuard
+{
+    void* p = nullptr;
+    ~CudaFreeGuard()
+    {
+        if (p != nullptr)
+        {
+            cudaFree(p);
+        }
+    }
+};
+
+} // namespace
+
+// ── Unit: config validation without GPU ─────────────────────────────
+//
+// Config is validated BEFORE the VkContext, so these run with an
+// uninitialized context. The message matchers pin each test to the
+// config check it targets — without them an uninitialized-context throw
+// would satisfy CHECK_THROWS_AS for the wrong reason.
+
+using Catch::Matchers::ContainsSubstring;
+
+TEST_CASE("ProjectionLayer ctor rejects non-RGBA8 color format", "[unit][projection_layer]")
+{
+    VkContext ctx;
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 64, 64 };
+    cfg.color_format = PixelFormat::kD32F;
+    CHECK_THROWS_WITH(ProjectionLayer(ctx, cfg), ContainsSubstring("color_format"));
+}
+
+TEST_CASE("ProjectionLayer ctor rejects non-D32F depth format", "[unit][projection_layer]")
+{
+    VkContext ctx;
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 64, 64 };
+    cfg.depth_format = PixelFormat::kRGBA8;
+    CHECK_THROWS_WITH(ProjectionLayer(ctx, cfg), ContainsSubstring("depth_format"));
+}
+
+TEST_CASE("ProjectionLayer ctor rejects zero view_resolution", "[unit][projection_layer]")
+{
+    VkContext ctx;
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 0, 64 };
+    CHECK_THROWS_WITH(ProjectionLayer(ctx, cfg), ContainsSubstring("view_resolution"));
+}
+
+// ── GPU: backend-compatibility validation ───────────────────────────
+
+TEST_CASE("ProjectionLayer validate_backend_compatibility enforces the direct-present contract", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 64, 64 };
+    ProjectionLayer mono(ctx, cfg);
+
+    // Matching mono display: ok.
+    CHECK_NOTHROW(mono.validate_backend_compatibility({ 64, 64 }, 1, 3));
+    // Resolution mismatch (would make the 1:1 swapchain copy out-of-bounds).
+    CHECK_THROWS_WITH(mono.validate_backend_compatibility({ 128, 128 }, 1, 3), ContainsSubstring("view_resolution"));
+    // Mono layer can't drive a 2-view (stereo) display.
+    CHECK_THROWS_WITH(mono.validate_backend_compatibility({ 64, 64 }, 2, 3), ContainsSubstring("stereo"));
+    // Backend cycles more in-flight images than the mailbox can hold.
+    CHECK_THROWS_WITH(mono.validate_backend_compatibility({ 64, 64 }, 1, ProjectionLayer::kMaxFramesInFlight + 1),
+                      ContainsSubstring("kMaxFramesInFlight"));
+
+    // A stereo layer is allowed on a mono display (the left eye is used).
+    cfg.stereo = true;
+    ProjectionLayer stereo(ctx, cfg);
+    CHECK_NOTHROW(stereo.validate_backend_compatibility({ 64, 64 }, 1, 3));
+    CHECK_NOTHROW(stereo.validate_backend_compatibility({ 64, 64 }, 2, 3));
+}
+
+// ── GPU: construction + accessors ───────────────────────────────────
+
+TEST_CASE("ProjectionLayer mono+depth creates valid handles for every slot+view", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 64, 64 };
+    ProjectionLayer layer(ctx, cfg);
+
+    CHECK(layer.name() == "ProjectionLayer");
+    CHECK(layer.view_count() == 1);
+    CHECK_FALSE(layer.is_stereo());
+    CHECK(layer.color_format() == PixelFormat::kRGBA8);
+    CHECK(layer.depth_format().has_value());
+    CHECK(*layer.depth_format() == PixelFormat::kD32F);
+
+    for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s)
+    {
+        REQUIRE(layer.color_image(s, 0) != nullptr);
+        CHECK(layer.color_image(s, 0)->vk_image() != VK_NULL_HANDLE);
+        CHECK(layer.color_image(s, 0)->cuda_array() != nullptr);
+        REQUIRE(layer.depth_image(s, 0) != nullptr);
+        CHECK(layer.depth_image(s, 0)->vk_image() != VK_NULL_HANDLE);
+        CHECK(layer.depth_image(s, 0)->cuda_array() != nullptr);
+        // View index out of range returns nullptr.
+        CHECK(layer.color_image(s, 1) == nullptr);
+        CHECK(layer.depth_image(s, 1) == nullptr);
+    }
+    CHECK(layer.color_image(ProjectionLayer::kSlotCount, 0) == nullptr);
+}
+
+TEST_CASE("ProjectionLayer stereo allocates per-eye storage", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 64, 64 };
+    cfg.stereo = true;
+    ProjectionLayer layer(ctx, cfg);
+
+    CHECK(layer.view_count() == 2);
+    CHECK(layer.is_stereo());
+    for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s)
+    {
+        REQUIRE(layer.color_image(s, 0) != nullptr);
+        REQUIRE(layer.color_image(s, 1) != nullptr);
+        REQUIRE(layer.depth_image(s, 0) != nullptr);
+        REQUIRE(layer.depth_image(s, 1) != nullptr);
+        CHECK(layer.color_image(s, 0)->vk_image() != layer.color_image(s, 1)->vk_image());
+    }
+}
+
+TEST_CASE("ProjectionLayer no-depth skips depth allocation", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 32, 32 };
+    cfg.depth_format = std::nullopt;
+    ProjectionLayer layer(ctx, cfg);
+
+    CHECK_FALSE(layer.depth_format().has_value());
+    for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s)
+    {
+        REQUIRE(layer.color_image(s, 0) != nullptr);
+        CHECK(layer.depth_image(s, 0) == nullptr);
+    }
+}
+
+TEST_CASE("ProjectionLayer destroy is idempotent", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 32, 32 };
+    ProjectionLayer layer(ctx, cfg);
+
+    layer.destroy();
+    layer.destroy(); // second call must be a no-op
+}
+
+// ── GPU: submit validation ──────────────────────────────────────────
+
+TEST_CASE("ProjectionLayer::submit rejects bad call shapes", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 64, 64 };
+    cfg.stereo = false;
+    ProjectionLayer layer(ctx, cfg);
+
+    void* color_dev = nullptr;
+    void* depth_dev = nullptr;
+    REQUIRE(cudaMalloc(&color_dev, 64 * 64 * 4) == cudaSuccess);
+    REQUIRE(cudaMalloc(&depth_dev, 64 * 64 * 4) == cudaSuccess);
+    CudaFreeGuard cg{ color_dev };
+    CudaFreeGuard dg{ depth_dev };
+
+    VizBuffer color{};
+    color.data = color_dev;
+    color.width = 64;
+    color.height = 64;
+    color.format = PixelFormat::kRGBA8;
+    color.space = viz::MemorySpace::kDevice;
+
+    VizBuffer depth{};
+    depth.data = depth_dev;
+    depth.width = 64;
+    depth.height = 64;
+    depth.format = PixelFormat::kD32F;
+    depth.space = viz::MemorySpace::kDevice;
+
+    SECTION("missing depth on depth-enabled layer")
+    {
+        CHECK_THROWS_AS(layer.submit(color), std::invalid_argument);
+    }
+    SECTION("mono layer rejects right-eye buffers")
+    {
+        CHECK_THROWS_AS(layer.submit(color, &depth, &color, &depth), std::invalid_argument);
+    }
+    SECTION("dimension mismatch rejected")
+    {
+        VizBuffer bad = color;
+        bad.width = 32;
+        CHECK_THROWS_AS(layer.submit(bad, &depth), std::invalid_argument);
+    }
+    SECTION("color format mismatch rejected")
+    {
+        VizBuffer bad = color;
+        bad.format = PixelFormat::kD32F;
+        CHECK_THROWS_AS(layer.submit(bad, &depth), std::invalid_argument);
+    }
+    SECTION("kHost rejected")
+    {
+        VizBuffer bad = color;
+        bad.space = viz::MemorySpace::kHost;
+        CHECK_THROWS_AS(layer.submit(bad, &depth), std::invalid_argument);
+    }
+}
+
+TEST_CASE("ProjectionLayer::submit mono+depth advances mailbox + signals semaphores", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 64, 64 };
+    ProjectionLayer layer(ctx, cfg);
+
+    void* color_dev = nullptr;
+    void* depth_dev = nullptr;
+    REQUIRE(cudaMalloc(&color_dev, 64 * 64 * 4) == cudaSuccess);
+    REQUIRE(cudaMalloc(&depth_dev, 64 * 64 * 4) == cudaSuccess);
+    CudaFreeGuard cg{ color_dev };
+    CudaFreeGuard dg{ depth_dev };
+
+    // Initialize to known patterns so we can verify the layer actually
+    // received our content. cudaMemset is sync-on-default-stream.
+    REQUIRE(cudaMemset(color_dev, 0x7F, 64 * 64 * 4) == cudaSuccess);
+    REQUIRE(cudaMemset(depth_dev, 0x40, 64 * 64 * 4) == cudaSuccess);
+
+    VizBuffer color{};
+    color.data = color_dev;
+    color.width = 64;
+    color.height = 64;
+    color.format = PixelFormat::kRGBA8;
+    color.space = viz::MemorySpace::kDevice;
+
+    VizBuffer depth{};
+    depth.data = depth_dev;
+    depth.width = 64;
+    depth.height = 64;
+    depth.format = PixelFormat::kD32F;
+    depth.space = viz::MemorySpace::kDevice;
+
+    // Pre-submit: no semaphore has been signaled.
+    for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s)
+    {
+        CHECK(layer.color_image(s, 0)->cuda_done_writing_value() == 0);
+        CHECK(layer.depth_image(s, 0)->cuda_done_writing_value() == 0);
+    }
+
+    // First submit lands in some slot; that slot's color + depth
+    // semaphores both advance to 1.
+    layer.submit(color, &depth);
+
+    uint32_t signaled = 0;
+    for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s)
+    {
+        const uint64_t cval = layer.color_image(s, 0)->cuda_done_writing_value();
+        const uint64_t dval = layer.depth_image(s, 0)->cuda_done_writing_value();
+        if (cval > 0 && dval > 0)
+        {
+            ++signaled;
+        }
+    }
+    CHECK(signaled == 1);
+}
+
+TEST_CASE("ProjectionLayer::submit stereo requires both eyes", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 64, 64 };
+    cfg.stereo = true;
+    ProjectionLayer layer(ctx, cfg);
+
+    void* color_dev = nullptr;
+    void* depth_dev = nullptr;
+    REQUIRE(cudaMalloc(&color_dev, 64 * 64 * 4) == cudaSuccess);
+    REQUIRE(cudaMalloc(&depth_dev, 64 * 64 * 4) == cudaSuccess);
+    CudaFreeGuard cg{ color_dev };
+    CudaFreeGuard dg{ depth_dev };
+
+    VizBuffer color{};
+    color.data = color_dev;
+    color.width = 64;
+    color.height = 64;
+    color.format = PixelFormat::kRGBA8;
+    color.space = viz::MemorySpace::kDevice;
+
+    VizBuffer depth{};
+    depth.data = depth_dev;
+    depth.width = 64;
+    depth.height = 64;
+    depth.format = PixelFormat::kD32F;
+    depth.space = viz::MemorySpace::kDevice;
+
+    // Stereo without right buffers throws.
+    CHECK_THROWS_AS(layer.submit(color, &depth), std::invalid_argument);
+
+    // Stereo with both eyes succeeds.
+    layer.submit(color, &depth, &color, &depth);
+    uint32_t signaled = 0;
+    for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s)
+    {
+        const bool left = layer.color_image(s, 0)->cuda_done_writing_value() > 0 &&
+                          layer.depth_image(s, 0)->cuda_done_writing_value() > 0;
+        const bool right = layer.color_image(s, 1)->cuda_done_writing_value() > 0 &&
+                           layer.depth_image(s, 1)->cuda_done_writing_value() > 0;
+        if (left && right)
+        {
+            ++signaled;
+        }
+    }
+    CHECK(signaled == 1);
+}
+
+TEST_CASE("ProjectionLayer::submit no-depth path accepts color only", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 32, 32 };
+    cfg.depth_format = std::nullopt;
+    ProjectionLayer layer(ctx, cfg);
+
+    void* color_dev = nullptr;
+    REQUIRE(cudaMalloc(&color_dev, 32 * 32 * 4) == cudaSuccess);
+    CudaFreeGuard cg{ color_dev };
+
+    VizBuffer color{};
+    color.data = color_dev;
+    color.width = 32;
+    color.height = 32;
+    color.format = PixelFormat::kRGBA8;
+    color.space = viz::MemorySpace::kDevice;
+
+    // depth-disabled layer must NOT accept a depth buffer.
+    VizBuffer fake_depth = color;
+    fake_depth.format = PixelFormat::kD32F;
+    CHECK_THROWS_AS(layer.submit(color, &fake_depth), std::invalid_argument);
+
+    // Without depth, submit succeeds.
+    layer.submit(color);
+
+    uint32_t signaled = 0;
+    for (uint32_t s = 0; s < ProjectionLayer::kSlotCount; ++s)
+    {
+        if (layer.color_image(s, 0)->cuda_done_writing_value() > 0)
+        {
+            ++signaled;
+        }
+    }
+    CHECK(signaled == 1);
+}
+
+TEST_CASE("ProjectionLayer acquire_direct_views returns latest slot images", "[gpu][projection_layer]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    VkContext ctx;
+    ctx.init({});
+
+    ProjectionLayer::Config cfg;
+    cfg.view_resolution = { 32, 32 };
+    ProjectionLayer layer(ctx, cfg);
+
+    // Nothing published yet → no direct views.
+    CHECK(layer.acquire_direct_views(0).empty());
+
+    void* color_dev = nullptr;
+    void* depth_dev = nullptr;
+    REQUIRE(cudaMalloc(&color_dev, 32 * 32 * 4) == cudaSuccess);
+    REQUIRE(cudaMalloc(&depth_dev, 32 * 32 * 4) == cudaSuccess);
+    CudaFreeGuard cg{ color_dev };
+    CudaFreeGuard dg{ depth_dev };
+
+    VizBuffer color{};
+    color.data = color_dev;
+    color.width = 32;
+    color.height = 32;
+    color.format = PixelFormat::kRGBA8;
+    color.space = viz::MemorySpace::kDevice;
+    VizBuffer depth{};
+    depth.data = depth_dev;
+    depth.width = 32;
+    depth.height = 32;
+    depth.format = PixelFormat::kD32F;
+    depth.space = viz::MemorySpace::kDevice;
+
+    layer.on_frame_begin();
+    layer.submit(color, &depth);
+
+    // Offscreen (no session attached → not XR), so the freshness gate is
+    // off and the latest publish is returned.
+    auto views = layer.acquire_direct_views(0);
+    REQUIRE(views.size() == 1);
+    CHECK(views[0].color != VK_NULL_HANDLE);
+    CHECK(views[0].depth != VK_NULL_HANDLE);
+    CHECK(views[0].extent.width == 32);
+    CHECK(views[0].extent.height == 32);
+
+    // get_wait_semaphores now waits on the promoted slot at TRANSFER stage.
+    const auto waits = layer.get_wait_semaphores();
+    REQUIRE(waits.size() == 2); // color + depth
+    for (const auto& w : waits)
+    {
+        CHECK(w.wait_stage == VK_PIPELINE_STAGE_TRANSFER_BIT);
+    }
+}
diff --git a/src/viz/python/core_bindings.cpp b/src/viz/python/core_bindings.cpp
index 5a1eb129f..7bbcb424e 100644
--- a/src/viz/python/core_bindings.cpp
+++ b/src/viz/python/core_bindings.cpp
@@ -120,6 +120,21 @@ orientation : (w, x, y, z) quaternion (identity = (1, 0, 0, 0))
         .def_readwrite("angle_up", &viz::Fov::angle_up)
         .def_readwrite("angle_down", &viz::Fov::angle_down);
 
+    py::class_<viz::ViewInfo>(m, "ViewInfo",
+                              R"doc(
+Per-eye render target metadata returned in ``FrameInfo.views``. In XR
+mode (kXr stereo), 2 entries — one per eye. In window / offscreen, 1
+entry with identity pose.
+
+Exposes ``viewport``, ``fov`` and ``pose`` (all from the predicted-
+display-time XR data for this frame). Renderers build their own view /
+projection matrices from ``pose`` + ``fov``.
+)doc")
+        .def(py::init<>())
+        .def_readonly("viewport", &viz::ViewInfo::viewport)
+        .def_readonly("fov", &viz::ViewInfo::fov)
+        .def_readonly("pose", &viz::ViewInfo::pose);
+
     // ── VizBuffer (with cuda/numpy interface) ──────────────────────────
 
     py::class_<viz::VizBuffer>(m, "VizBuffer",
diff --git a/src/viz/python/layers_bindings.cpp b/src/viz/python/layers_bindings.cpp
index dba90c76c..ddf9ffa2b 100644
--- a/src/viz/python/layers_bindings.cpp
+++ b/src/viz/python/layers_bindings.cpp
@@ -13,10 +13,13 @@
 #include <pybind11/pybind11.h>
 #include <pybind11/stl.h>
 #include <viz/core/viz_buffer.hpp>
+#include <viz/layers/projection_layer.hpp>
 #include <viz/layers/quad_layer.hpp>
 
 #include <cstdint>
 #include <memory>
+#include <optional>
+#include <stdexcept>
 
 namespace viz_py
 {
@@ -134,6 +137,120 @@ numpy on a CUDA device pointer); the binding converts it on the fly.
         .def("set_visible", &viz::QuadLayer::set_visible, "visible"_a)
         .def("is_visible", &viz::QuadLayer::is_visible)
         .def_property_readonly("name", [](const viz::QuadLayer& l) { return l.name(); });
+
+    // ── ProjectionLayer ────────────────────────────────────────────────
+
+    py::class_<viz::ProjectionLayer::Config>(m, "ProjectionLayerConfig")
+        .def(py::init<>())
+        .def_readwrite("name", &viz::ProjectionLayer::Config::name)
+        .def_readwrite("view_resolution", &viz::ProjectionLayer::Config::view_resolution)
+        .def_readwrite("color_format", &viz::ProjectionLayer::Config::color_format)
+        .def_readwrite("depth_format", &viz::ProjectionLayer::Config::depth_format,
+                       "PixelFormat.D32F for depth output (Z-composite with QuadLayer); None to disable.")
+        .def_readwrite("stereo", &viz::ProjectionLayer::Config::stereo,
+                       "Per-eye paired storage. When True, submit() requires both eyes' buffers; "
+                       "in kXr view 0 → left, view 1 → right.");
+
+    py::class_<viz::ProjectionLayer, std::unique_ptr<viz::ProjectionLayer, py::nodelete>>(m, "ProjectionLayer",
+                                                                                          R"doc(
+Full-view RGBD layer. Owned by VizSession; the Python handle is
+non-owning (don't keep it around past the session).
+
+Designed for renderers (gsplat, nvblox, neural reconstruction) that
+produce per-view (color, depth) buffers. The renderer runs IN-LOOP with
+the OpenXR frame loop — `submit()` must be called between
+`session.begin_frame()` and `session.end_frame()`, and the renderer
+must render against `info.views[i].pose` from the FrameInfo returned by
+`begin_frame()`.
+
+Typical pattern::
+
+    while running:
+        info = session.begin_frame()
+        color, depth = renderer.render(info.views)
+        layer.submit(color, depth=depth)
+        session.end_frame()
+
+If the renderer is slower than display rate, the runtime / CloudXR
+paces the application via xrWaitFrame and reprojects the last submitted
+frame at display rate. In `kXr`, a visible ProjectionLayer that fails
+to submit for the current frame is skipped at record time so stale RGBD
+isn't composited under a new projection-layer pose.
+
+Each buffer is a VizBuffer or any __cuda_array_interface__ object
+(cupy / torch / numba). submit() does one CUDA→CUDA copy per buffer on
+the supplied stream and BLOCKS on cudaStreamSynchronize so the caller
+can re-use ``color`` / ``depth`` immediately.
+)doc")
+        .def(
+            "submit",
+            [](viz::ProjectionLayer& self, py::object left_color, py::object left_depth, py::object right_color,
+               py::object right_depth, uintptr_t stream)
+            {
+                auto to_buf = [&self](py::object obj, viz::PixelFormat fmt, const char* label) -> viz::VizBuffer
+                {
+                    if (py::isinstance<viz::VizBuffer>(obj))
+                    {
+                        return obj.cast<viz::VizBuffer>();
+                    }
+                    return cuda_array_to_viz_buffer(obj, fmt, self.view_resolution(), label);
+                };
+
+                // Materialize each buffer (or std::nullopt). View slots
+                // that aren't provided pass nullptr through to submit.
+                std::optional<viz::VizBuffer> lc;
+                std::optional<viz::VizBuffer> ld;
+                std::optional<viz::VizBuffer> rc;
+                std::optional<viz::VizBuffer> rd;
+                if (!left_color.is_none())
+                {
+                    lc = to_buf(left_color, self.color_format(), "ProjectionLayer.submit(left_color)");
+                }
+                else
+                {
+                    throw std::runtime_error("ProjectionLayer.submit: left_color is required");
+                }
+                if (!left_depth.is_none())
+                {
+                    ld = to_buf(left_depth, viz::PixelFormat::kD32F, "ProjectionLayer.submit(left_depth)");
+                }
+                if (!right_color.is_none())
+                {
+                    rc = to_buf(right_color, self.color_format(), "ProjectionLayer.submit(right_color)");
+                }
+                if (!right_depth.is_none())
+                {
+                    rd = to_buf(right_depth, viz::PixelFormat::kD32F, "ProjectionLayer.submit(right_depth)");
+                }
+
+                py::gil_scoped_release release;
+                try
+                {
+                    self.submit(*lc, ld.has_value() ? &*ld : nullptr, rc.has_value() ? &*rc : nullptr,
+                                rd.has_value() ? &*rd : nullptr, reinterpret_cast<cudaStream_t>(stream));
+                }
+                catch (const std::invalid_argument& e)
+                {
+                    // C++ submit reports bad call shapes as invalid_argument
+                    // (→ ValueError); re-raise as runtime_error so it surfaces
+                    // as RuntimeError, matching the buffer-conversion errors.
+                    throw std::runtime_error(e.what());
+                }
+            },
+            "left_color"_a, "left_depth"_a = py::none(), "right_color"_a = py::none(), "right_depth"_a = py::none(),
+            "stream"_a = 0,
+            "Submit a frame. Each arg is a VizBuffer or any __cuda_array_interface__ object. "
+            "Mono: only ``left_color`` (+ ``left_depth`` if depth-enabled). "
+            "Stereo: pair with ``right_color`` (+ depths). Buffers must match view_resolution "
+            "and the layer's pixel formats. Releases the GIL across the copy + sync.")
+        .def_property_readonly("view_resolution", &viz::ProjectionLayer::view_resolution)
+        .def_property_readonly("color_format", &viz::ProjectionLayer::color_format)
+        .def_property_readonly("depth_format", &viz::ProjectionLayer::depth_format)
+        .def_property_readonly("stereo", &viz::ProjectionLayer::is_stereo)
+        .def_property_readonly("view_count", &viz::ProjectionLayer::view_count)
+        .def("set_visible", &viz::ProjectionLayer::set_visible, "visible"_a)
+        .def("is_visible", &viz::ProjectionLayer::is_visible)
+        .def_property_readonly("name", [](const viz::ProjectionLayer& l) { return l.name(); });
 }
 
 } // namespace viz_py
diff --git a/src/viz/python/session_bindings.cpp b/src/viz/python/session_bindings.cpp
index 8698cced4..387f96a6f 100644
--- a/src/viz/python/session_bindings.cpp
+++ b/src/viz/python/session_bindings.cpp
@@ -12,6 +12,7 @@
 #include <pybind11/pybind11.h>
 #include <pybind11/stl.h>
 #include <viz/core/vk_context.hpp>
+#include <viz/layers/projection_layer.hpp>
 #include <viz/layers/quad_layer.hpp>
 #include <viz/session/frame_info.hpp>
 #include <viz/session/viz_session.hpp>
@@ -35,7 +36,10 @@ void bind_session(py::module_& m)
         .def_readonly("predicted_display_time", &viz::FrameInfo::predicted_display_time)
         .def_readonly("delta_time", &viz::FrameInfo::delta_time)
         .def_readonly("should_render", &viz::FrameInfo::should_render)
-        .def_readonly("resolution", &viz::FrameInfo::resolution);
+        .def_readonly("resolution", &viz::FrameInfo::resolution)
+        .def_readonly("views", &viz::FrameInfo::views,
+                      "Per-eye render target metadata. 2 entries in XR stereo, 1 in window/offscreen. "
+                      "Renderers should render against ``views[i].pose`` + ``views[i].fov``.");
 
     py::class_<viz::FrameTimingStats>(m, "FrameTimingStats")
         .def(py::init<>())
@@ -106,6 +110,20 @@ Construct via ``VizSession.create(config)``. Add layers with
             },
             "config"_a, py::return_value_policy::reference_internal,
             "Construct + register a QuadLayer. Returns a non-owning handle.")
+        .def(
+            "add_projection_layer",
+            [](viz::VizSession& self, viz::ProjectionLayer::Config config) -> viz::ProjectionLayer*
+            {
+                const auto* ctx = self.get_vk_context();
+                if (ctx == nullptr)
+                {
+                    throw std::runtime_error("VizSession: cannot add layer before session is initialized");
+                }
+                // ProjectionLayer is direct-present-only — no render pass.
+                return self.add_layer<viz::ProjectionLayer>(*ctx, std::move(config));
+            },
+            "config"_a, py::return_value_policy::reference_internal,
+            "Construct + register a ProjectionLayer. Returns a non-owning handle.")
         .def("render", &viz::VizSession::render, py::call_guard<py::gil_scoped_release>(),
              "Wait + composite + present in one call. Returns FrameInfo.")
         .def("begin_frame", &viz::VizSession::begin_frame, py::call_guard<py::gil_scoped_release>())
diff --git a/src/viz/python/viz_init.py b/src/viz/python/viz_init.py
index c07923b0d..0ed6252a6 100644
--- a/src/viz/python/viz_init.py
+++ b/src/viz/python/viz_init.py
@@ -40,12 +40,15 @@
     MemorySpace,
     PixelFormat,
     Pose3D,
+    ProjectionLayer,
+    ProjectionLayerConfig,
     QuadLayer,
     QuadLayerConfig,
     QuadLayerPlacement,
     Rect2D,
     Resolution,
     SessionState,
+    ViewInfo,
     VizBuffer,
     VizSession,
     VizSessionConfig,
@@ -62,12 +65,15 @@
     "MemorySpace",
     "PixelFormat",
     "Pose3D",
+    "ProjectionLayer",
+    "ProjectionLayerConfig",
     "QuadLayer",
     "QuadLayerConfig",
     "QuadLayerPlacement",
     "Rect2D",
     "Resolution",
     "SessionState",
+    "ViewInfo",
     "VizBuffer",
     "VizSession",
     "VizSessionConfig",
diff --git a/src/viz/python_tests/test_projection_layer.py b/src/viz/python_tests/test_projection_layer.py
new file mode 100644
index 000000000..a2da89b8c
--- /dev/null
+++ b/src/viz/python_tests/test_projection_layer.py
@@ -0,0 +1,261 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+"""End-to-end ProjectionLayer tests via Python bindings.
+
+Covers: config plumbing, add_projection_layer, submit shape validation,
+mono+depth round-trip render, stereo + no-depth variants. GPU-gated.
+"""
+
+from __future__ import annotations
+
+import numpy as np
+import pytest
+
+import isaacteleop.viz as viz
+
+
+def _gpu_available() -> bool:
+    cfg = viz.VizSessionConfig()
+    cfg.mode = viz.DisplayMode.kOffscreen
+    cfg.window_width = 64
+    cfg.window_height = 64
+    s = None
+    try:
+        s = viz.VizSession.create(cfg)
+    except RuntimeError:
+        return False
+    finally:
+        if s is not None:
+            s.destroy()
+    return True
+
+
+pytestmark = pytest.mark.skipif(
+    not _gpu_available(), reason="no Vulkan/CUDA-capable GPU"
+)
+
+
+def _need_cupy():
+    cp = pytest.importorskip("cupy")
+    try:
+        if cp.cuda.runtime.getDeviceCount() == 0:
+            pytest.skip("no CUDA device")
+    except cp.cuda.runtime.CUDARuntimeError:
+        pytest.skip("no CUDA device")
+    return cp
+
+
+def _make_session(width=64, height=64):
+    cfg = viz.VizSessionConfig()
+    cfg.mode = viz.DisplayMode.kOffscreen
+    cfg.window_width = width
+    cfg.window_height = height
+    cfg.clear_color = (0.0, 0.0, 0.0, 1.0)
+    return viz.VizSession.create(cfg)
+
+
+def test_projection_layer_config_roundtrip():
+    cfg = viz.ProjectionLayerConfig()
+    cfg.name = "test"
+    cfg.view_resolution = viz.Resolution(128, 64)
+    cfg.color_format = viz.PixelFormat.kRGBA8
+    cfg.depth_format = viz.PixelFormat.kD32F
+    cfg.stereo = True
+
+    assert cfg.name == "test"
+    assert cfg.view_resolution.width == 128
+    assert cfg.view_resolution.height == 64
+    assert cfg.depth_format == viz.PixelFormat.kD32F
+    assert cfg.stereo is True
+
+    # depth_format can be None
+    cfg.depth_format = None
+    assert cfg.depth_format is None
+
+
+def test_add_projection_layer_mono_depth():
+    cp = _need_cupy()
+    session = _make_session()
+    try:
+        layer_cfg = viz.ProjectionLayerConfig()
+        layer_cfg.name = "proj"
+        layer_cfg.view_resolution = viz.Resolution(32, 32)
+        layer = session.add_projection_layer(layer_cfg)
+
+        assert layer.name == "proj"
+        assert layer.is_visible() is True
+        assert layer.view_resolution.width == 32
+        assert layer.view_resolution.height == 32
+        assert layer.color_format == viz.PixelFormat.kRGBA8
+        assert layer.depth_format == viz.PixelFormat.kD32F
+        assert layer.stereo is False
+        assert layer.view_count == 1
+
+        # Submit mono + depth via cupy.
+        host_color = np.zeros((32, 32, 4), dtype=np.uint8)
+        host_color[..., 2] = 200  # blue channel
+        host_color[..., 3] = 255
+        host_depth = np.full((32, 32), 0.5, dtype=np.float32)
+        color = cp.asarray(host_color)
+        depth = cp.asarray(host_depth)
+
+        layer.submit(color, depth)
+
+        info = session.render()
+        assert info.frame_index == 0
+
+        img = session.readback_to_host()
+        arr = np.asarray(img)
+        # Predominantly blue at the center; ProjectionLayer covers the
+        # whole framebuffer.
+        r, g, b, _a = arr[32, 32]
+        assert b > r and b > g
+    finally:
+        session.destroy()
+
+
+def test_submit_rejects_missing_depth_on_depth_layer():
+    cp = _need_cupy()
+    session = _make_session()
+    try:
+        layer_cfg = viz.ProjectionLayerConfig()
+        layer_cfg.view_resolution = viz.Resolution(32, 32)
+        layer = session.add_projection_layer(layer_cfg)
+
+        color = cp.asarray(np.zeros((32, 32, 4), dtype=np.uint8))
+        with pytest.raises(RuntimeError, match="left_depth"):
+            layer.submit(color)
+    finally:
+        session.destroy()
+
+
+def test_submit_rejects_dimension_mismatch():
+    cp = _need_cupy()
+    session = _make_session()
+    try:
+        layer_cfg = viz.ProjectionLayerConfig()
+        layer_cfg.view_resolution = viz.Resolution(32, 32)
+        layer = session.add_projection_layer(layer_cfg)
+
+        # Wrong width.
+        wrong_color = cp.asarray(np.zeros((32, 16, 4), dtype=np.uint8))
+        depth = cp.asarray(np.zeros((32, 32), dtype=np.float32))
+        with pytest.raises(RuntimeError, match="resolution"):
+            layer.submit(wrong_color, depth)
+    finally:
+        session.destroy()
+
+
+def test_stereo_round_trip():
+    cp = _need_cupy()
+    session = _make_session()
+    try:
+        layer_cfg = viz.ProjectionLayerConfig()
+        layer_cfg.view_resolution = viz.Resolution(32, 32)
+        layer_cfg.stereo = True
+        layer = session.add_projection_layer(layer_cfg)
+        assert layer.stereo is True
+        assert layer.view_count == 2
+
+        host_lc = np.zeros((32, 32, 4), dtype=np.uint8)
+        host_lc[..., 0] = 200  # red for LEFT
+        host_lc[..., 3] = 255
+        host_rc = np.zeros((32, 32, 4), dtype=np.uint8)
+        host_rc[..., 1] = 200  # green for RIGHT
+        host_rc[..., 3] = 255
+        host_d = np.full((32, 32), 0.5, dtype=np.float32)
+        lc = cp.asarray(host_lc)
+        rc = cp.asarray(host_rc)
+        ld = cp.asarray(host_d)
+        rd = cp.asarray(host_d)
+
+        # Stereo without right eye → must throw.
+        with pytest.raises(RuntimeError, match="right_color"):
+            layer.submit(lc, ld)
+
+        # Stereo with both eyes.
+        layer.submit(lc, ld, rc, rd)
+        session.render()
+        # In offscreen (single-view), the LEFT buffer is sampled — so the
+        # readback should be predominantly red.
+        arr = np.asarray(session.readback_to_host())
+        r, g, b, _a = arr[32, 32]
+        assert r > g and r > b
+    finally:
+        session.destroy()
+
+
+def test_no_depth_layer():
+    cp = _need_cupy()
+    session = _make_session()
+    try:
+        layer_cfg = viz.ProjectionLayerConfig()
+        layer_cfg.view_resolution = viz.Resolution(32, 32)
+        layer_cfg.depth_format = None
+        layer = session.add_projection_layer(layer_cfg)
+        assert layer.depth_format is None
+
+        host_color = np.zeros((32, 32, 4), dtype=np.uint8)
+        host_color[..., 0] = 255  # red
+        host_color[..., 3] = 255
+        color = cp.asarray(host_color)
+
+        # Depth-disabled layer must reject any depth buffer.
+        depth = cp.asarray(np.zeros((32, 32), dtype=np.float32))
+        with pytest.raises(RuntimeError, match="depth-disabled"):
+            layer.submit(color, depth)
+
+        layer.submit(color)
+        session.render()
+        arr = np.asarray(session.readback_to_host())
+        r, g, b, _a = arr[32, 32]
+        assert r > g and r > b
+    finally:
+        session.destroy()
+
+
+def test_begin_frame_returns_views_for_renderer():
+    """``session.begin_frame()`` is the source of truth for poses the
+    renderer should render against. In offscreen mode the backend
+    returns a single identity-pose ViewInfo."""
+    session = _make_session()
+    try:
+        info = session.begin_frame()
+        assert len(info.views) >= 1
+        session.end_frame()
+    finally:
+        session.destroy()
+
+
+def test_inloop_submit_pattern():
+    """The supported pattern: begin_frame → submit (against this frame's
+    views) → end_frame, all in one tick. Window/offscreen modes have no
+    XR freshness gate, so the layer renders on every frame that submits."""
+    cp = _need_cupy()
+    session = _make_session()
+    try:
+        layer_cfg = viz.ProjectionLayerConfig()
+        layer_cfg.view_resolution = viz.Resolution(32, 32)
+        layer = session.add_projection_layer(layer_cfg)
+
+        host_color = np.zeros((32, 32, 4), dtype=np.uint8)
+        host_color[..., 2] = 200  # blue
+        host_color[..., 3] = 255
+        host_depth = np.full((32, 32), 0.5, dtype=np.float32)
+
+        for _ in range(3):
+            info = session.begin_frame()
+            assert info.should_render
+            # In a real renderer we'd pass info.views to the GPU side; here
+            # the buffers are static.
+            color = cp.asarray(host_color)
+            depth = cp.asarray(host_depth)
+            layer.submit(color, depth)
+            session.end_frame()
+
+        # Final readback shows the submitted color.
+        arr = np.asarray(session.readback_to_host())
+        r, g, b, _a = arr[32, 32]
+        assert b > r and b > g
+    finally:
+        session.destroy()
diff --git a/src/viz/session/cpp/inc/viz/session/display_backend.hpp b/src/viz/session/cpp/inc/viz/session/display_backend.hpp
index 9f74827c5..977753f40 100644
--- a/src/viz/session/cpp/inc/viz/session/display_backend.hpp
+++ b/src/viz/session/cpp/inc/viz/session/display_backend.hpp
@@ -6,6 +6,7 @@
 #include <viz/core/host_image.hpp>
 #include <viz/core/render_target.hpp>
 #include <viz/core/viz_types.hpp>
+#include <viz/session/layer_base.hpp> // DirectPresentView
 #include <vulkan/vulkan.h>
 
 #include <cstdint>
@@ -94,6 +95,13 @@ class DisplayBackend
         // Backend-private bookkeeping round-tripped to record_post_* /
         // end_frame (e.g. swapchain image_index, predicted_display_time).
         uint64_t backend_token = 0;
+
+        // OpenXR predicted display time in nanoseconds (from
+        // xrWaitFrame's XrFrameState.predictedDisplayTime), exposed
+        // through FrameInfo so renderers can use it for time-aware
+        // content (e.g. animation timestamps that match the runtime's
+        // prediction). 0 outside kXr.
+        int64_t predicted_display_time_ns = 0;
     };
 
     // Acquire the next frame target. nullopt = skip this frame.
@@ -109,6 +117,36 @@ class DisplayBackend
     {
     }
 
+    // True when the backend implements the direct-present path
+    // (record_direct). The compositor uses it together with a layer's
+    // supports_direct_present() to choose direct vs. composited.
+    virtual bool supports_direct() const noexcept
+    {
+        return false;
+    }
+
+    // Direct-present path: copy a direct layer's per-view (color, depth)
+    // images STRAIGHT into the presentation swapchains, replacing the
+    // render-pass + record_post_render_pass for this frame. ``views`` has
+    // one entry per backend view (1 window/offscreen, 2 kXr stereo); the
+    // source images are in SHADER_READ_ONLY_OPTIMAL with extent equal to
+    // the swapchain per-view size. Empty ``views`` → clear the swapchains.
+    // The compositor still threads the layer's CUDA-done wait semaphores
+    // (TRANSFER stage) into the submit. Default: unsupported.
+    virtual void record_direct(VkCommandBuffer /*cmd*/,
+                               const Frame& /*frame*/,
+                               const std::vector<DirectPresentView>& /*views*/)
+    {
+    }
+
+    // Per-view resolution a direct layer should render at so its copy to
+    // the swapchain is 1:1 (kXr: per-eye; window/offscreen: the full
+    // target). Default: the (single-view) render-target extent.
+    virtual Resolution recommended_view_resolution() const
+    {
+        return current_extent();
+    }
+
     // Called after a successful submit. The host has NOT waited on the
     // in-flight fence (multi-frame-in-flight: that wait happens at the
     // top of render() for this slot's NEXT use), so the GPU may still
diff --git a/src/viz/session/cpp/inc/viz/session/layer_base.hpp b/src/viz/session/cpp/inc/viz/session/layer_base.hpp
index 8f17a66dd..78e443abb 100644
--- a/src/viz/session/cpp/inc/viz/session/layer_base.hpp
+++ b/src/viz/session/cpp/inc/viz/session/layer_base.hpp
@@ -16,6 +16,7 @@ namespace viz
 
 class RenderTarget;
 class VizSession;
+class VkContext;
 
 // Maps ViewInfo::viewport → vkCmdSetViewport (origin top-left, depth
 // [0,1], no y-flip). Layers call this once per view before drawing.
@@ -32,6 +33,20 @@ inline void bind_view_viewport(VkCommandBuffer cmd, const ViewInfo& view)
     vkCmdSetViewport(cmd, 0, 1, &vp);
 }
 
+// Per-view source images for the direct-present path: a layer whose
+// content is already a full-view (color, depth) image pair the backend
+// can copy STRAIGHT into the presentation swapchains, bypassing the
+// shared render target + render pass. This mirrors holohub xr_gsplat:
+// the renderer's depth lands in the XR depth swapchain verbatim (no
+// gl_FragDepth round-trip), so CloudXR reprojection gets exact depth.
+// 1 entry for window/offscreen, 2 for kXr stereo.
+struct DirectPresentView
+{
+    VkImage color = VK_NULL_HANDLE; // resting layout SHADER_READ_ONLY_OPTIMAL
+    VkImage depth = VK_NULL_HANDLE; // VK_NULL_HANDLE when the layer has no depth
+    Resolution extent{}; // must equal the swapchain per-view size
+};
+
 // Abstract layer drawn into the compositor's render pass (RGBA8_SRGB
 // color + D32_SFLOAT depth, single-sample). record() issues draw calls;
 // it must NOT end the render pass or submit. Resource lifetime is the
@@ -56,6 +71,16 @@ class LayerBase
     {
     }
 
+    // Called from ``VizSession::begin_frame`` for EVERY registered layer
+    // (visible or not) before the new frame's FrameInfo is returned.
+    // Lets layers clear per-frame state (e.g. ProjectionLayer's
+    // submitted-this-frame flag). Default = no-op. Must NOT touch GPU
+    // state — the backend's begin_frame has already run, and the
+    // compositor's per-slot fence wait hasn't happened yet.
+    virtual void on_frame_begin()
+    {
+    }
+
     // Issue draws inside the active render pass.
     //   views:    1 entry in window/offscreen, 2 in kXr stereo. Each
     //             entry's viewport is this layer's rect for that view —
@@ -86,6 +111,51 @@ class LayerBase
         return {};
     }
 
+    // True only for ProjectionLayer. VizSession uses it to enforce the
+    // single-projection XOR multi-quad invariant, and the compositor uses
+    // it to pick the direct-present path.
+    virtual bool is_projection_layer() const noexcept
+    {
+        return false;
+    }
+
+    // The VkContext this layer's GPU resources came from (nullptr if none).
+    // add_layer rejects a layer whose context isn't the session's — its
+    // images/semaphores would be used on the wrong device/queue.
+    virtual const VkContext* vk_context() const noexcept
+    {
+        return nullptr;
+    }
+
+    // Direct-present support (see DirectPresentView). When true, the
+    // compositor — for a session whose only layer is this one — skips the
+    // render pass and asks the backend to copy these images straight to
+    // the swapchains. Default: not supported (composited via the RT).
+    virtual bool supports_direct_present() const noexcept
+    {
+        return false;
+    }
+
+    // Promote this frame's content into ``in_flight_slot`` (same slot the
+    // compositor passes to record()/get_wait_semaphores) and return the
+    // per-view source images to copy. Empty vector = nothing fresh to
+    // present this frame (backend clears the swapchains). Called instead
+    // of record_pre_render_pass()/record() on the direct path.
+    virtual std::vector<DirectPresentView> acquire_direct_views(uint32_t /*in_flight_slot*/)
+    {
+        return {};
+    }
+
+    // Let a layer reject a backend it can't run on. Called once by add_layer
+    // with the backend's per-view recommended resolution, view count (1
+    // window/offscreen, 2 kXr stereo), and in-flight image count; throws
+    // std::invalid_argument on mismatch. Default: no requirements.
+    virtual void validate_backend_compatibility(Resolution /*recommended_view_resolution*/,
+                                                uint32_t /*backend_view_count*/,
+                                                uint32_t /*backend_image_count*/) const
+    {
+    }
+
     // Window-mode aspect-fit hint. nullopt = fill the tile; kXr ignores.
     virtual std::optional<float> aspect_ratio() const noexcept
     {
diff --git a/src/viz/session/cpp/inc/viz/session/offscreen_backend.hpp b/src/viz/session/cpp/inc/viz/session/offscreen_backend.hpp
index 2d5ace885..9afbcb618 100644
--- a/src/viz/session/cpp/inc/viz/session/offscreen_backend.hpp
+++ b/src/viz/session/cpp/inc/viz/session/offscreen_backend.hpp
@@ -29,6 +29,15 @@ class OffscreenBackend final : public DisplayBackend
         return 1;
     }
 
+    // Direct-present: copy a ProjectionLayer's color image into the RT's
+    // color attachment (leaving it in TRANSFER_SRC) so readback_to_host
+    // works unchanged. Depth isn't read back. Empty views → clear.
+    bool supports_direct() const noexcept override
+    {
+        return true;
+    }
+    void record_direct(VkCommandBuffer cmd, const Frame& frame, const std::vector<DirectPresentView>& views) override;
+
     // Synchronous tightly-packed RGBA8 copy of the RT's color attachment.
     HostImage readback_to_host() override;
 
diff --git a/src/viz/session/cpp/inc/viz/session/viz_compositor.hpp b/src/viz/session/cpp/inc/viz/session/viz_compositor.hpp
index 6a1050c11..b5ff16e80 100644
--- a/src/viz/session/cpp/inc/viz/session/viz_compositor.hpp
+++ b/src/viz/session/cpp/inc/viz/session/viz_compositor.hpp
@@ -6,6 +6,7 @@
 #include <viz/core/frame_sync.hpp>
 #include <viz/core/host_image.hpp>
 #include <viz/core/viz_types.hpp>
+#include <viz/session/display_backend.hpp>
 #include <vulkan/vulkan.h>
 
 #include <memory>
@@ -55,15 +56,22 @@ class VizCompositor
     VizCompositor(VizCompositor&&) = delete;
     VizCompositor& operator=(VizCompositor&&) = delete;
 
-    // Records and submits one frame. Multi-frame-in-flight: one
-    // FrameSync per backend image slot. render() waits on the slot's
-    // fence at entry (signaled by its previous use), submits with the
-    // same fence as signal target, and returns without host-waiting
-    // on completion. CPU pacing is the caller's responsibility — the
-    // window backend prefers MAILBOX (no vsync block), so a hot loop
-    // would burn a core; camera_viz drives this from an event-driven
-    // condition variable that wakes per producer publish.
-    void render(const std::vector<LayerBase*>& layers);
+    // Records and submits one frame against the backend ``Frame``
+    // already acquired by VizSession::begin_frame. Multi-frame-in-
+    // flight: one FrameSync per backend image slot. render() waits on
+    // the slot's fence at entry (signaled by its previous use),
+    // submits with the same fence as signal target, and returns
+    // without host-waiting on completion. CPU pacing is the caller's
+    // responsibility — the window backend prefers MAILBOX (no vsync
+    // block), so a hot loop would burn a core; camera_viz drives this
+    // from an event-driven condition variable that wakes per producer
+    // publish.
+    //
+    // Owns end_frame / abort_frame protocol balance for the supplied
+    // ``frame``: on successful submit, calls backend->end_frame; on
+    // exception, calls backend->abort_frame via RAII guard before
+    // re-throwing.
+    void render(const DisplayBackend::Frame& frame, const std::vector<LayerBase*>& layers);
 
     HostImage readback_to_host();
 
diff --git a/src/viz/session/cpp/inc/viz/session/viz_session.hpp b/src/viz/session/cpp/inc/viz/session/viz_session.hpp
index 30a43b208..0e0fbed56 100644
--- a/src/viz/session/cpp/inc/viz/session/viz_session.hpp
+++ b/src/viz/session/cpp/inc/viz/session/viz_session.hpp
@@ -16,6 +16,7 @@
 #include <chrono>
 #include <memory>
 #include <optional>
+#include <stdexcept>
 #include <string>
 #include <utility>
 #include <vector>
@@ -106,11 +107,37 @@ class VizSession
     // content updates / set_visible(). The session owns the layer's
     // lifetime. add_layer / remove_layer must run on the same thread
     // as the frame loop; only LayerBase::set_visible() is atomic.
+    //
+    // Layer-mode invariant (for now): a session holds EITHER one
+    // ProjectionLayer OR any number of QuadLayers, never both. The
+    // ProjectionLayer is direct-present (copied straight to the swapchain),
+    // while QuadLayers composite into the shared render target — the two
+    // paths don't coexist yet. Violations throw std::invalid_argument.
     template <typename L, typename... Args>
     L* add_layer(Args&&... args)
     {
         auto layer = std::make_unique<L>(std::forward<Args>(args)...);
         L* raw = layer.get();
+
+        bool have_projection = false;
+        for (const auto& l : layers_)
+        {
+            have_projection = have_projection || l->is_projection_layer();
+        }
+        if (raw->is_projection_layer() && !layers_.empty())
+        {
+            throw std::invalid_argument("VizSession: a ProjectionLayer must be the session's only layer");
+        }
+        if (!raw->is_projection_layer() && have_projection)
+        {
+            throw std::invalid_argument("VizSession: cannot add another layer alongside a ProjectionLayer");
+        }
+
+        // Let the layer reject a backend it can't run on (e.g. a direct-present
+        // layer whose resolution/stereo/slot count doesn't match). Throws
+        // before the layer is registered, so a rejected add leaves no state.
+        validate_layer_against_backend_(raw);
+
         raw->attach_to_session_(this);
         layers_.push_back(std::move(layer));
         return raw;
@@ -182,6 +209,10 @@ class VizSession
     void init();
 
     const VkContext& ctx() const noexcept;
+    // Ask a freshly-constructed layer to validate itself against the active
+    // backend (resolution / stereo / in-flight count). No-op until the
+    // backend is initialized. Throws std::invalid_argument on mismatch.
+    void validate_layer_against_backend_(LayerBase* layer) const;
     void update_timing_stats(float frame_time_seconds);
     // Poll backend events + handle resize. Called by render() and
     // begin_frame() so explicit-loop users get the same behavior.
@@ -207,6 +238,11 @@ class VizSession
     bool first_frame_ = true;
     bool frame_in_progress_ = false;
     FrameInfo current_frame_info_{};
+    // The backend-acquired frame for the in-progress begin/end pair.
+    // Acquired by begin_frame, consumed by end_frame. nullopt outside
+    // a begin/end window or when the backend skipped this frame
+    // (e.g. XR runtime shouldRender=0).
+    std::optional<DisplayBackend::Frame> current_backend_frame_;
     FrameTimingStats timing_stats_{};
 };
 
diff --git a/src/viz/session/cpp/inc/viz/session/window_backend.hpp b/src/viz/session/cpp/inc/viz/session/window_backend.hpp
index ac61058e8..070c8a288 100644
--- a/src/viz/session/cpp/inc/viz/session/window_backend.hpp
+++ b/src/viz/session/cpp/inc/viz/session/window_backend.hpp
@@ -38,6 +38,16 @@ class WindowBackend final : public DisplayBackend
     std::optional<Frame> begin_frame(int64_t predicted_display_time) override;
     const RenderTarget& render_target() const override;
     void record_post_render_pass(VkCommandBuffer cmd, const Frame& frame) override;
+
+    // Direct-present: blit a ProjectionLayer's single color image straight
+    // to the window swapchain, skipping the shared RT. Depth is unused in
+    // window mode. Empty views → clear to black.
+    bool supports_direct() const noexcept override
+    {
+        return true;
+    }
+    void record_direct(VkCommandBuffer cmd, const Frame& frame, const std::vector<DirectPresentView>& views) override;
+
     void end_frame(const Frame& frame) override;
     void abort_frame(const Frame& frame) override;
 
diff --git a/src/viz/session/cpp/inc/viz/session/xr_backend.hpp b/src/viz/session/cpp/inc/viz/session/xr_backend.hpp
index 7e977820b..4a95a6c2e 100644
--- a/src/viz/session/cpp/inc/viz/session/xr_backend.hpp
+++ b/src/viz/session/cpp/inc/viz/session/xr_backend.hpp
@@ -67,6 +67,17 @@ class XrBackend final : public DisplayBackend
     void end_frame(const Frame& frame) override;
     void abort_frame(const Frame& frame) override;
 
+    // Direct-present: copy a ProjectionLayer's per-eye color/depth straight
+    // into the per-eye color + depth swapchains (vkCmdCopyImage, verbatim),
+    // skipping the shared RT — so the renderer's depth reaches CloudXR
+    // exactly. Per-eye recommended size keeps the copy 1:1.
+    bool supports_direct() const noexcept override
+    {
+        return true;
+    }
+    void record_direct(VkCommandBuffer cmd, const Frame& frame, const std::vector<DirectPresentView>& views) override;
+    Resolution recommended_view_resolution() const override;
+
     void poll_events() override;
     bool should_close() const override;
     Resolution current_extent() const override;
@@ -116,6 +127,13 @@ class XrBackend final : public DisplayBackend
     void destroy_swapchains();
     void create_intermediate();
 
+    // Per-eye staging buffers that bridge a direct ProjectionLayer's depth
+    // (stored as R32_SFLOAT — CUDA can't interop a depth-format image) into
+    // the D32_SFLOAT depth swapchain via image->buffer->image. The float bits
+    // copy verbatim. Allocated only when depth submission is enabled.
+    void create_depth_staging();
+    void destroy_depth_staging() noexcept;
+
     // Release every swapchain currently flagged `acquired`.
     void release_acquired_swapchains() noexcept;
     // Submit an empty xrEndFrame to balance an outstanding xrBeginFrame.
@@ -139,6 +157,15 @@ class XrBackend final : public DisplayBackend
     std::vector<ViewSwapchain> depth_swapchains_;
     bool depth_layer_enabled_ = false;
 
+    // Per-eye R32_SFLOAT->D32_SFLOAT bridge buffers (see create_depth_staging).
+    struct DepthStaging
+    {
+        VkBuffer buffer = VK_NULL_HANDLE;
+        VkDeviceMemory memory = VK_NULL_HANDLE;
+        VkDeviceSize size = 0;
+    };
+    std::vector<DepthStaging> depth_staging_;
+
     // Per-frame state — valid only while frame_began_ == true.
     XrFrameState last_frame_state_{ XR_TYPE_FRAME_STATE };
     XrViewState last_view_state_{ XR_TYPE_VIEW_STATE };
diff --git a/src/viz/session/cpp/offscreen_backend.cpp b/src/viz/session/cpp/offscreen_backend.cpp
index 0cb0d3b10..236dc6478 100644
--- a/src/viz/session/cpp/offscreen_backend.cpp
+++ b/src/viz/session/cpp/offscreen_backend.cpp
@@ -37,6 +37,30 @@ uint32_t find_memory_type(VkPhysicalDevice physical_device, uint32_t type_bits,
     throw std::runtime_error("OffscreenBackend: no memory type matches readback requirements");
 }
 
+void transition_image(VkCommandBuffer cmd,
+                      VkImage image,
+                      VkImageLayout old_layout,
+                      VkImageLayout new_layout,
+                      VkAccessFlags src_access,
+                      VkAccessFlags dst_access,
+                      VkPipelineStageFlags src_stage,
+                      VkPipelineStageFlags dst_stage)
+{
+    VkImageMemoryBarrier b{};
+    b.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
+    b.oldLayout = old_layout;
+    b.newLayout = new_layout;
+    b.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
+    b.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
+    b.image = image;
+    b.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+    b.subresourceRange.levelCount = 1;
+    b.subresourceRange.layerCount = 1;
+    b.srcAccessMask = src_access;
+    b.dstAccessMask = dst_access;
+    vkCmdPipelineBarrier(cmd, src_stage, dst_stage, 0, 0, nullptr, 0, nullptr, 1, &b);
+}
+
 } // namespace
 
 OffscreenBackend::OffscreenBackend() = default;
@@ -141,6 +165,58 @@ HostImage OffscreenBackend::readback_to_host()
     return result;
 }
 
+void OffscreenBackend::record_direct(VkCommandBuffer cmd,
+                                     const Frame& /*frame*/,
+                                     const std::vector<DirectPresentView>& views)
+{
+    if (render_target_ == nullptr)
+    {
+        return;
+    }
+    const VkImage dst = render_target_->color_image();
+    const VkImage src = views.empty() ? VK_NULL_HANDLE : views[0].color;
+
+    // Discard prior RT contents (UNDEFINED) — the copy/clear fully
+    // overwrites — and leave the RT in TRANSFER_SRC so readback_to_host's
+    // image→buffer copy works exactly as it does after the render pass.
+    transition_image(cmd, dst, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 0,
+                     VK_ACCESS_TRANSFER_WRITE_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT);
+
+    if (src != VK_NULL_HANDLE)
+    {
+        transition_image(cmd, src, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+                         VK_ACCESS_SHADER_READ_BIT, VK_ACCESS_TRANSFER_READ_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
+                         VK_PIPELINE_STAGE_TRANSFER_BIT);
+
+        VkImageCopy region{};
+        region.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+        region.srcSubresource.layerCount = 1;
+        region.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+        region.dstSubresource.layerCount = 1;
+        // 1:1 copy — add_layer guarantees source extent == target extent.
+        region.extent = { extent_.width, extent_.height, 1 };
+        vkCmdCopyImage(
+            cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &region);
+
+        transition_image(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL,
+                         VK_ACCESS_TRANSFER_READ_BIT, VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
+                         VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT);
+    }
+    else
+    {
+        VkClearColorValue clear{};
+        VkImageSubresourceRange range{};
+        range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+        range.levelCount = 1;
+        range.layerCount = 1;
+        vkCmdClearColorImage(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear, 1, &range);
+    }
+
+    transition_image(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+                     VK_ACCESS_TRANSFER_WRITE_BIT, VK_ACCESS_TRANSFER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
+                     VK_PIPELINE_STAGE_TRANSFER_BIT);
+}
+
 void OffscreenBackend::create_readback_staging()
 {
     readback_byte_size_ =
diff --git a/src/viz/session/cpp/swapchain.cpp b/src/viz/session/cpp/swapchain.cpp
index 271579df6..887eefcdf 100644
--- a/src/viz/session/cpp/swapchain.cpp
+++ b/src/viz/session/cpp/swapchain.cpp
@@ -31,21 +31,23 @@ void check_vk(VkResult r, const char* what)
     }
 }
 
-// Pick a surface format. Prefer B8G8R8A8_SRGB (common Linux default,
-// matches our intermediate framebuffer's sRGB color space). Fall back
-// to any *_SRGB format. Else accept whatever the runtime offers first.
+// Pick a surface format. Prefer R8G8B8A8_SRGB so the direct-present path's
+// vkCmdCopyImage (raw bytes, no channel reorder) from our RGBA layer images
+// lands in the right channel order; a BGRA swapchain would swap R/B. Fall
+// back to B8G8R8A8_SRGB (common Linux default — the composited path blits, so
+// channel order is handled there), then any *_SRGB, then whatever's offered.
 VkSurfaceFormatKHR pick_surface_format(const std::vector<VkSurfaceFormatKHR>& formats)
 {
     for (const auto& f : formats)
     {
-        if (f.format == VK_FORMAT_B8G8R8A8_SRGB && f.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR)
+        if (f.format == VK_FORMAT_R8G8B8A8_SRGB && f.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR)
         {
             return f;
         }
     }
     for (const auto& f : formats)
     {
-        if (f.format == VK_FORMAT_R8G8B8A8_SRGB && f.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR)
+        if (f.format == VK_FORMAT_B8G8R8A8_SRGB && f.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR)
         {
             return f;
         }
diff --git a/src/viz/session/cpp/viz_compositor.cpp b/src/viz/session/cpp/viz_compositor.cpp
index 4a73c73ec..9ac7095e1 100644
--- a/src/viz/session/cpp/viz_compositor.cpp
+++ b/src/viz/session/cpp/viz_compositor.cpp
@@ -191,7 +191,7 @@ void VizCompositor::submit_or_signal_fence(const VkSubmitInfo& info, const char*
     throw std::runtime_error(std::string("VizCompositor: ") + what + " failed: VkResult=" + std::to_string(r));
 }
 
-void VizCompositor::render(const std::vector<LayerBase*>& layers)
+void VizCompositor::render(const DisplayBackend::Frame& frame, const std::vector<LayerBase*>& layers)
 {
     // Snapshot visible layers once — is_visible() is atomic, and
     // reading it twice could record a draw without the matching wait.
@@ -205,19 +205,12 @@ void VizCompositor::render(const std::vector<LayerBase*>& layers)
         }
     }
 
-    auto frame = backend_->begin_frame(/*predicted_display_time=*/0);
-    if (!frame.has_value())
-    {
-        // Backend skipped; all fences stay signaled, next wait() won't deadlock.
-        return;
-    }
-
     // Catch swapchain recreates whose image_count differs from the one
-    // we sized per-slot state for. Runs AFTER begin_frame because
-    // WindowBackend::begin_frame may itself recreate (OUT_OF_DATE etc.).
-    // Wrapped so a failed rebuild balances the backend protocol — we've
-    // already acquired a swapchain image and FrameGuard isn't set up
-    // yet, so a raw throw would leak the acquire.
+    // we sized per-slot state for. Runs first because the backend's
+    // begin_frame (run by VizSession) may itself recreate (OUT_OF_DATE
+    // etc.). Wrapped so a failed rebuild balances the backend protocol —
+    // the frame is already acquired and FrameGuard isn't set up yet, so
+    // a raw throw would leak it.
     try
     {
         ensure_slot_count_matches_backend();
@@ -226,7 +219,7 @@ void VizCompositor::render(const std::vector<LayerBase*>& layers)
     {
         try
         {
-            backend_->abort_frame(*frame);
+            backend_->abort_frame(frame);
         }
         catch (...)
         {
@@ -242,14 +235,14 @@ void VizCompositor::render(const std::vector<LayerBase*>& layers)
         // threw; reaching here means logic drift. Bail rather than UB.
         try
         {
-            backend_->abort_frame(*frame);
+            backend_->abort_frame(frame);
         }
         catch (...)
         {
         }
         throw std::runtime_error("VizCompositor: slot_count == 0 after ensure_slot_count_matches_backend");
     }
-    const uint32_t slot = static_cast<uint32_t>(frame->backend_token) % slot_count;
+    const uint32_t slot = static_cast<uint32_t>(frame.backend_token) % slot_count;
     FrameSync& slot_sync = *frame_syncs_[slot];
     VkCommandBuffer command_buffer = command_buffers_[slot];
 
@@ -299,18 +292,24 @@ void VizCompositor::render(const std::vector<LayerBase*>& layers)
                 }
             }
         }
-    } frame_guard{ backend_, &*frame };
+    } frame_guard{ backend_, &frame };
 
     const RenderTarget& rt = backend_->render_target();
     const Resolution rt_extent = rt.resolution();
 
-    // XR: per-eye viewports come from frame->views. tile layout is
+    // XR: per-eye viewports come from frame.views. tile layout is
     // window/offscreen letterboxing only.
     const bool xr_mode = backend_->is_xr();
 
-    // Per-layer aspect-fit tiles (window/offscreen only).
+    // Direct-present: a single ProjectionLayer copied straight to the
+    // swapchain (no shared render pass). VizSession's add_layer guarantees
+    // a direct layer is the session's only layer, so size()==1 suffices.
+    const bool direct_mode =
+        visible_layers.size() == 1 && visible_layers[0]->supports_direct_present() && backend_->supports_direct();
+
+    // Per-layer aspect-fit tiles (window/offscreen composited path only).
     std::vector<TileSlot> tiles;
-    if (!xr_mode && !visible_layers.empty())
+    if (!xr_mode && !direct_mode && !visible_layers.empty())
     {
         const float fb_aspect = static_cast<float>(rt_extent.width) / static_cast<float>(rt_extent.height);
         std::vector<float> aspects;
@@ -335,68 +334,97 @@ void VizCompositor::render(const std::vector<LayerBase*>& layers)
         vkCmdWriteTimestamp(command_buffer, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 0);
     }
 
-    // Pre-pass hook for transfer/compute work that can't run inside a
-    // render pass (e.g. QuadLayer mip-chain blits). Ordering: all
-    // layers' pre-pass run BEFORE any record(), so a layer can rely on
-    // its own pre-pass results inside record().
-    for (LayerBase* layer : visible_layers)
+    if (direct_mode)
     {
-        layer->record_pre_render_pass(command_buffer, slot);
-    }
+        // Promote the layer's latest published (color, depth) for this slot
+        // and copy straight into the swapchain — no render pass, no shared
+        // RT. acquire_direct_views must precede the wait-semaphore gather
+        // below so get_wait_semaphores reflects the promoted slot.
+        const std::vector<DirectPresentView> direct_views = visible_layers[0]->acquire_direct_views(slot);
 
-    std::array<VkClearValue, 2> clears{};
-    clears[0].color = config_.clear_color;
-    clears[1].depthStencil = { 1.0f, 0 };
-
-    VkRenderPassBeginInfo rp{};
-    rp.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
-    rp.renderPass = rt.render_pass();
-    rp.framebuffer = rt.framebuffer();
-    rp.renderArea.offset = { 0, 0 };
-    rp.renderArea.extent = { rt_extent.width, rt_extent.height };
-    rp.clearValueCount = static_cast<uint32_t>(clears.size());
-    rp.pClearValues = clears.data();
+        // ts1: nothing rendered into the RT; mark the point for symmetry.
+        if (gpu_timestamp_pool_ != VK_NULL_HANDLE)
+        {
+            vkCmdWriteTimestamp(
+                command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 1);
+        }
 
-    vkCmdBeginRenderPass(command_buffer, &rp, VK_SUBPASS_CONTENTS_INLINE);
+        backend_->record_direct(command_buffer, frame, direct_views);
 
-    // Window/offscreen: scissor=tile.outer + view[0].viewport=tile.content
-    // for aspect-fit letterboxing.
-    if (xr_mode)
-    {
-        const VkRect2D rt_full{ { 0, 0 }, { rt_extent.width, rt_extent.height } };
-        vkCmdSetScissor(command_buffer, 0, 1, &rt_full);
+        // ts2: end of direct copy (ts2-ts1 = copy/transition cost).
+        if (gpu_timestamp_pool_ != VK_NULL_HANDLE)
+        {
+            vkCmdWriteTimestamp(
+                command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 2);
+        }
     }
-    for (size_t i = 0; i < visible_layers.size(); ++i)
+    else
     {
-        std::vector<ViewInfo> layer_views = frame->views;
-        if (layer_views.empty())
+        // Pre-pass hook for transfer/compute work that can't run inside a
+        // render pass (e.g. QuadLayer mip-chain blits). Ordering: all
+        // layers' pre-pass run BEFORE any record(), so a layer can rely on
+        // its own pre-pass results inside record().
+        for (LayerBase* layer : visible_layers)
+        {
+            layer->record_pre_render_pass(command_buffer, slot);
+        }
+
+        std::array<VkClearValue, 2> clears{};
+        clears[0].color = config_.clear_color;
+        clears[1].depthStencil = { 1.0f, 0 };
+
+        VkRenderPassBeginInfo rp{};
+        rp.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
+        rp.renderPass = rt.render_pass();
+        rp.framebuffer = rt.framebuffer();
+        rp.renderArea.offset = { 0, 0 };
+        rp.renderArea.extent = { rt_extent.width, rt_extent.height };
+        rp.clearValueCount = static_cast<uint32_t>(clears.size());
+        rp.pClearValues = clears.data();
+
+        vkCmdBeginRenderPass(command_buffer, &rp, VK_SUBPASS_CONTENTS_INLINE);
+
+        // Window/offscreen: scissor=tile.outer + view[0].viewport=tile.content
+        // for aspect-fit letterboxing.
+        if (xr_mode)
         {
-            layer_views.push_back(ViewInfo{});
+            const VkRect2D rt_full{ { 0, 0 }, { rt_extent.width, rt_extent.height } };
+            vkCmdSetScissor(command_buffer, 0, 1, &rt_full);
         }
-        if (!xr_mode)
+        for (size_t i = 0; i < visible_layers.size(); ++i)
         {
-            const VkRect2D scissor_rect = tiles[i].outer;
-            const VkRect2D viewport_rect = tiles[i].content;
-            vkCmdSetScissor(command_buffer, 0, 1, &scissor_rect);
-            layer_views[0].viewport = to_rect2d(viewport_rect);
+            std::vector<ViewInfo> layer_views = frame.views;
+            if (layer_views.empty())
+            {
+                layer_views.push_back(ViewInfo{});
+            }
+            if (!xr_mode)
+            {
+                const VkRect2D scissor_rect = tiles[i].outer;
+                const VkRect2D viewport_rect = tiles[i].content;
+                vkCmdSetScissor(command_buffer, 0, 1, &scissor_rect);
+                layer_views[0].viewport = to_rect2d(viewport_rect);
+            }
+            visible_layers[i]->record(command_buffer, layer_views, rt, slot);
         }
-        visible_layers[i]->record(command_buffer, layer_views, rt, slot);
-    }
 
-    vkCmdEndRenderPass(command_buffer);
+        vkCmdEndRenderPass(command_buffer);
 
-    // ts1: end of render pass.
-    if (gpu_timestamp_pool_ != VK_NULL_HANDLE)
-    {
-        vkCmdWriteTimestamp(command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 1);
-    }
+        // ts1: end of render pass.
+        if (gpu_timestamp_pool_ != VK_NULL_HANDLE)
+        {
+            vkCmdWriteTimestamp(
+                command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 1);
+        }
 
-    backend_->record_post_render_pass(command_buffer, *frame);
+        backend_->record_post_render_pass(command_buffer, frame);
 
-    // ts2: end of backend post-pass (ts2-ts1 = blit/transition cost).
-    if (gpu_timestamp_pool_ != VK_NULL_HANDLE)
-    {
-        vkCmdWriteTimestamp(command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 2);
+        // ts2: end of backend post-pass (ts2-ts1 = blit/transition cost).
+        if (gpu_timestamp_pool_ != VK_NULL_HANDLE)
+        {
+            vkCmdWriteTimestamp(
+                command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, gpu_timestamp_pool_, query_base + 2);
+        }
     }
 
     // ts3: cmd-buffer-end (total = ts3-ts0).
@@ -423,18 +451,18 @@ void VizCompositor::render(const std::vector<LayerBase*>& layers)
             }
         }
     }
-    if (frame->wait_before_render != VK_NULL_HANDLE)
+    if (frame.wait_before_render != VK_NULL_HANDLE)
     {
-        wait_semaphores.push_back(frame->wait_before_render);
+        wait_semaphores.push_back(frame.wait_before_render);
         wait_values.push_back(0);
-        wait_stages.push_back(frame->wait_stage);
+        wait_stages.push_back(frame.wait_stage);
     }
 
     std::vector<VkSemaphore> signal_semaphores;
     std::vector<uint64_t> signal_values;
-    if (frame->signal_after_render != VK_NULL_HANDLE)
+    if (frame.signal_after_render != VK_NULL_HANDLE)
     {
-        signal_semaphores.push_back(frame->signal_after_render);
+        signal_semaphores.push_back(frame.signal_after_render);
         signal_values.push_back(0);
     }
 
@@ -463,9 +491,15 @@ void VizCompositor::render(const std::vector<LayerBase*>& layers)
     submit_or_signal_fence(submit, "vkQueueSubmit", slot_sync.in_flight_fence());
     cmd_guard.released = true;
 
-    // No trailing host wait — the slot's fence is gated by the next
-    // render that targets this slot. GPU timing forces a synchronous
-    // wait to read query results; opt-in only.
+    // No trailing host wait: the slot's fence gates the NEXT render into this
+    // slot (cmd buffer + depth staging reuse), and OpenXR guarantees the
+    // runtime reads the swapchain only after our queue-submitted writes
+    // complete (queue ordering — the same VkQueue we bound to the session).
+    // Host-waiting here would block xrBeginFrame..xrEndFrame for a full GPU
+    // frame, pushing xrEndFrame past predictedDisplayTime and forcing the
+    // runtime to reproject every frame (visible tearing/judder under motion).
+    // holohub's xr_gsplat pipelines the same way. The gpu_timing path still
+    // waits — it has to read back the timestamp queries.
     if (gpu_timestamp_pool_ != VK_NULL_HANDLE)
     {
         slot_sync.wait();
@@ -488,7 +522,7 @@ void VizCompositor::render(const std::vector<LayerBase*>& layers)
         }
     }
 
-    backend_->end_frame(*frame);
+    backend_->end_frame(frame);
     frame_guard.released = true;
 }
 
diff --git a/src/viz/session/cpp/viz_session.cpp b/src/viz/session/cpp/viz_session.cpp
index d4796e8a8..36840c31b 100644
--- a/src/viz/session/cpp/viz_session.cpp
+++ b/src/viz/session/cpp/viz_session.cpp
@@ -211,11 +211,47 @@ FrameInfo VizSession::begin_frame()
     last_frame_time_ = now;
 
     current_frame_info_.frame_index = frame_index_;
-    current_frame_info_.predicted_display_time = 0; // XR-only; 0 in offscreen
-    current_frame_info_.should_render = (state_ == SessionState::kRunning);
     current_frame_info_.resolution = compositor_ ? compositor_->resolution() : Resolution{};
-    // Identity placeholder; backends fill per-view info inside render().
-    current_frame_info_.views.assign(1, ViewInfo{});
+    current_backend_frame_.reset();
+
+    // Acquire the backend frame BEFORE returning so renderers calling
+    // submit() against the returned FrameInfo's views are working with
+    // the same per-eye poses xrEndFrame will submit later. Skip the
+    // acquire when state isn't kRunning (kStopping/kLost paths submit
+    // empty xrEndFrames internally) or when the backend itself returns
+    // nullopt (XR shouldRender=0, swapchain skip).
+    if (state_ == SessionState::kRunning && backend_)
+    {
+        current_backend_frame_ = backend_->begin_frame(/*ignored=*/0);
+    }
+
+    if (current_backend_frame_.has_value())
+    {
+        current_frame_info_.should_render = true;
+        current_frame_info_.predicted_display_time = current_backend_frame_->predicted_display_time_ns;
+        current_frame_info_.views = current_backend_frame_->views;
+        if (current_frame_info_.views.empty())
+        {
+            current_frame_info_.views.assign(1, ViewInfo{});
+        }
+    }
+    else
+    {
+        current_frame_info_.should_render = false;
+        current_frame_info_.predicted_display_time = 0;
+        current_frame_info_.views.assign(1, ViewInfo{});
+    }
+
+    // Notify layers a new frame has begun. Lets ProjectionLayer-style
+    // layers clear per-frame freshness flags so a stale mailbox slot
+    // doesn't get composited under a new pose.
+    for (const auto& layer : layers_)
+    {
+        if (layer != nullptr)
+        {
+            layer->on_frame_begin();
+        }
+    }
 
     frame_in_progress_ = true;
     return current_frame_info_;
@@ -227,31 +263,27 @@ void VizSession::end_frame()
     {
         throw std::logic_error("VizSession: end_frame called without a matching begin_frame");
     }
-    if (state_ != SessionState::kRunning)
-    {
-        frame_in_progress_ = false;
-        return;
-    }
 
     struct ClearGuard
     {
         bool* flag;
+        std::optional<DisplayBackend::Frame>* frame_slot;
         ~ClearGuard()
         {
             *flag = false;
+            frame_slot->reset();
         }
-    } guard{ &frame_in_progress_ };
-
-    std::vector<LayerBase*> raw_layers;
-    raw_layers.reserve(layers_.size());
-    for (const auto& l : layers_)
-    {
-        raw_layers.push_back(l.get());
-    }
+    } guard{ &frame_in_progress_, &current_backend_frame_ };
 
-    if (current_frame_info_.should_render)
+    if (current_backend_frame_.has_value())
     {
-        compositor_->render(raw_layers);
+        std::vector<LayerBase*> raw_layers;
+        raw_layers.reserve(layers_.size());
+        for (const auto& l : layers_)
+        {
+            raw_layers.push_back(l.get());
+        }
+        compositor_->render(*current_backend_frame_, raw_layers);
     }
 
     update_timing_stats(current_frame_info_.delta_time);
@@ -278,11 +310,34 @@ void VizSession::update_timing_stats(float frame_time_seconds)
         (timing_stats_.avg_frame_time_ms > 0.0f) ? 1000.0f / timing_stats_.avg_frame_time_ms : 0.0f;
 }
 
+void VizSession::validate_layer_against_backend_(LayerBase* layer) const
+{
+    // No backend yet (layers may be built standalone in tests) → nothing to check.
+    if (layer == nullptr || backend_ == nullptr)
+    {
+        return;
+    }
+    // Context affinity: a foreign context's images/semaphores would be used on
+    // this session's queue — invalid cross-device usage.
+    if (layer->vk_context() != nullptr && layer->vk_context() != &ctx())
+    {
+        throw std::invalid_argument(
+            "VizSession: layer was created from a different VkContext than the session's; "
+            "build layers with VizSession::get_vk_context().");
+    }
+    const uint32_t backend_view_count = backend_->is_xr() ? 2u : 1u;
+    layer->validate_backend_compatibility(
+        backend_->recommended_view_resolution(), backend_view_count, backend_->image_count());
+}
+
 Resolution VizSession::get_recommended_resolution() const noexcept
 {
-    if (compositor_)
+    // Per-view size a direct ProjectionLayer should render at (kXr: per-eye,
+    // so its copy to the swapchain is 1:1). Window/offscreen report the
+    // single-view target extent.
+    if (backend_)
     {
-        return compositor_->resolution();
+        return backend_->recommended_view_resolution();
     }
     return Resolution{ config_.window_width, config_.window_height };
 }
diff --git a/src/viz/session/cpp/window_backend.cpp b/src/viz/session/cpp/window_backend.cpp
index 10b62af2f..e6fe6ab5a 100644
--- a/src/viz/session/cpp/window_backend.cpp
+++ b/src/viz/session/cpp/window_backend.cpp
@@ -205,6 +205,68 @@ void WindowBackend::record_post_render_pass(VkCommandBuffer cmd, const Frame& fr
                      VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT);
 }
 
+void WindowBackend::record_direct(VkCommandBuffer cmd, const Frame& frame, const std::vector<DirectPresentView>& views)
+{
+    if (swapchain_ == nullptr || render_target_ == nullptr)
+    {
+        return;
+    }
+    // Fill the intermediate RT (R8G8B8A8_SRGB) with the layer's color, then
+    // reuse the normal RT->swapchain blit. That blit decodes the sRGB RT to
+    // linear and re-encodes to the swapchain format — which both round-trips
+    // the gsplat sRGB bytes correctly AND reorders channels for a BGRA surface
+    // AND scales. A direct copy to the swapchain can't reorder channels (wrong
+    // RGB on BGRA surfaces) and a direct blit double-encodes sRGB.
+    const VkImage rt = render_target_->color_image();
+    const Resolution rt_extent = render_target_->resolution();
+    const VkImage src = views.empty() ? VK_NULL_HANDLE : views[0].color;
+
+    transition_image(cmd, rt, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 0,
+                     VK_ACCESS_TRANSFER_WRITE_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT);
+
+    if (src != VK_NULL_HANDLE)
+    {
+        // Raw copy: source UNORM bytes (already sRGB-encoded) land verbatim in
+        // the SRGB RT. Sizes match (layer renders at the RT/window size).
+        transition_image(cmd, src, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+                         VK_ACCESS_SHADER_READ_BIT, VK_ACCESS_TRANSFER_READ_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
+                         VK_PIPELINE_STAGE_TRANSFER_BIT);
+
+        const Resolution se = views[0].extent;
+        const uint32_t cw = std::min(se.width, rt_extent.width);
+        const uint32_t ch = std::min(se.height, rt_extent.height);
+        VkImageCopy region{};
+        region.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+        region.srcSubresource.layerCount = 1;
+        region.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+        region.dstSubresource.layerCount = 1;
+        region.extent = { cw, ch, 1 };
+        vkCmdCopyImage(
+            cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, rt, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &region);
+
+        transition_image(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL,
+                         VK_ACCESS_TRANSFER_READ_BIT, VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
+                         VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT);
+    }
+    else
+    {
+        VkClearColorValue clear{};
+        VkImageSubresourceRange range{};
+        range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+        range.levelCount = 1;
+        range.layerCount = 1;
+        vkCmdClearColorImage(cmd, rt, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear, 1, &range);
+    }
+
+    // Leave the RT in TRANSFER_SRC (the layout record_post_render_pass expects
+    // after the render pass), then run the standard RT->swapchain blit.
+    transition_image(cmd, rt, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+                     VK_ACCESS_TRANSFER_WRITE_BIT, VK_ACCESS_TRANSFER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
+                     VK_PIPELINE_STAGE_TRANSFER_BIT);
+
+    record_post_render_pass(cmd, frame);
+}
+
 void WindowBackend::end_frame(const Frame& frame)
 {
     if (swapchain_ == nullptr)
diff --git a/src/viz/session/cpp/xr_backend.cpp b/src/viz/session/cpp/xr_backend.cpp
index 42dd68c7c..9f8efe70b 100644
--- a/src/viz/session/cpp/xr_backend.cpp
+++ b/src/viz/session/cpp/xr_backend.cpp
@@ -28,6 +28,28 @@ void check_xr(XrResult r, const char* what)
     }
 }
 
+void check_vk(VkResult r, const char* what)
+{
+    if (r != VK_SUCCESS)
+    {
+        throw std::runtime_error(std::string("XrBackend: ") + what + " failed: VkResult=" + std::to_string(r));
+    }
+}
+
+uint32_t find_memory_type(VkPhysicalDevice physical_device, uint32_t type_bits, VkMemoryPropertyFlags properties)
+{
+    VkPhysicalDeviceMemoryProperties mem_props;
+    vkGetPhysicalDeviceMemoryProperties(physical_device, &mem_props);
+    for (uint32_t i = 0; i < mem_props.memoryTypeCount; ++i)
+    {
+        if ((type_bits & (1u << i)) != 0 && (mem_props.memoryTypes[i].propertyFlags & properties) == properties)
+        {
+            return i;
+        }
+    }
+    throw std::runtime_error("XrBackend: no memory type matches depth staging requirements");
+}
+
 void transition_image(VkCommandBuffer cmd,
                       VkImage image,
                       VkImageLayout old_layout,
@@ -104,6 +126,7 @@ void XrBackend::init(const VkContext& ctx, Resolution /*preferred_size*/)
         if (depth_layer_enabled_)
         {
             create_depth_swapchains();
+            create_depth_staging();
         }
         create_intermediate();
     }
@@ -123,6 +146,7 @@ void XrBackend::destroy()
     // Order: rendering resources → session. Runtime owns swapchain
     // images, so xrDestroySwapchain is enough (no vkDestroyImage).
     render_target_.reset();
+    destroy_depth_staging();
     destroy_swapchains();
     session_.reset();
     ctx_ = nullptr;
@@ -282,6 +306,58 @@ void XrBackend::create_depth_swapchains()
     }
 }
 
+void XrBackend::create_depth_staging()
+{
+    const VkDevice device = ctx_->device();
+    depth_staging_.assign(depth_swapchains_.size(), DepthStaging{});
+    for (size_t i = 0; i < depth_swapchains_.size(); ++i)
+    {
+        const VkDeviceSize size = static_cast<VkDeviceSize>(depth_swapchains_[i].width) * depth_swapchains_[i].height * 4;
+
+        VkBufferCreateInfo bi{};
+        bi.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
+        bi.size = size;
+        bi.usage = VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT;
+        bi.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
+        check_vk(vkCreateBuffer(device, &bi, nullptr, &depth_staging_[i].buffer), "vkCreateBuffer(depth staging)");
+
+        VkMemoryRequirements reqs;
+        vkGetBufferMemoryRequirements(device, depth_staging_[i].buffer, &reqs);
+
+        VkMemoryAllocateInfo ai{};
+        ai.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
+        ai.allocationSize = reqs.size;
+        ai.memoryTypeIndex =
+            find_memory_type(ctx_->physical_device(), reqs.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
+        check_vk(vkAllocateMemory(device, &ai, nullptr, &depth_staging_[i].memory), "vkAllocateMemory(depth staging)");
+        check_vk(vkBindBufferMemory(device, depth_staging_[i].buffer, depth_staging_[i].memory, 0),
+                 "vkBindBufferMemory(depth staging)");
+        depth_staging_[i].size = size;
+    }
+}
+
+void XrBackend::destroy_depth_staging() noexcept
+{
+    if (ctx_ == nullptr)
+    {
+        return;
+    }
+    const VkDevice device = ctx_->device();
+    for (auto& s : depth_staging_)
+    {
+        if (s.buffer != VK_NULL_HANDLE)
+        {
+            vkDestroyBuffer(device, s.buffer, nullptr);
+        }
+        if (s.memory != VK_NULL_HANDLE)
+        {
+            vkFreeMemory(device, s.memory, nullptr);
+        }
+        s = DepthStaging{};
+    }
+    depth_staging_.clear();
+}
+
 void XrBackend::create_swapchains()
 {
     const auto& views = session_->view_configuration_views();
@@ -492,6 +568,10 @@ std::optional<DisplayBackend::Frame> XrBackend::begin_frame(int64_t /*ignored*/)
     // invariant holds if image_count ever grows past 1.
     const uint32_t slots = image_count();
     f.backend_token = (slots == 0) ? 0u : (frame_index_++ % slots);
+    // Predicted display time forwarded to FrameInfo so renderers can
+    // use it for time-aware content (e.g. animation timestamps that
+    // line up with the runtime's prediction).
+    f.predicted_display_time_ns = static_cast<int64_t>(last_frame_state_.predictedDisplayTime);
     // Hand protocol-balance off to the compositor's FrameGuard.
     in_flight_guard.dismissed = true;
     return f;
@@ -606,14 +686,173 @@ void XrBackend::record_post_render_pass(VkCommandBuffer cmd, const Frame& frame)
             vkCmdCopyImage(cmd, depth_src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, dst,
                            VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &region);
 
-            // DEPTH_STENCIL_READ_ONLY is the conventional "runtime
-            // samples this" layout for the depth-info subImage.
+            // CloudXR follows the same handoff convention as Holohub's
+            // XrSwapchainCuda: depth swapchain images are returned in
+            // DEPTH_STENCIL_ATTACHMENT_OPTIMAL before xrEndFrame. Leaving
+            // them in READ_ONLY can make the runtime consume stale/invalid
+            // depth even though the XrCompositionLayerDepthInfoKHR metadata
+            // is otherwise correct.
             transition_image_aspect(cmd, dst, VK_IMAGE_ASPECT_DEPTH_BIT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
-                                    VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL, VK_ACCESS_TRANSFER_WRITE_BIT,
-                                    VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
-                                    VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT);
+                                    VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, VK_ACCESS_TRANSFER_WRITE_BIT, 0,
+                                    VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT);
+        }
+    }
+}
+
+void XrBackend::record_direct(VkCommandBuffer cmd, const Frame& /*frame*/, const std::vector<DirectPresentView>& views)
+{
+    if (!frame_renderable_)
+    {
+        return;
+    }
+
+    // Direct-present: copy a ProjectionLayer's per-eye (color, depth)
+    // straight into the per-eye swapchains — no wide intermediate, no
+    // blit. recommended_view_resolution() sized the layer's images to the
+    // per-eye swapchain, so each copy is 1:1 and the renderer's depth
+    // reaches CloudXR byte-for-byte (matching Holohub's xr_gsplat path).
+    for (size_t i = 0; i < view_swapchains_.size(); ++i)
+    {
+        const auto& sw = view_swapchains_[i];
+        const VkImage dst = sw.images[sw.current_image_index];
+        const VkImage src = (i < views.size()) ? views[i].color : VK_NULL_HANDLE;
+
+        transition_image(cmd, dst, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 0,
+                         VK_ACCESS_TRANSFER_WRITE_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT);
+
+        if (src != VK_NULL_HANDLE)
+        {
+            // Source DeviceImage rests in SHADER_READ_ONLY_OPTIMAL; move it
+            // to TRANSFER_SRC for the copy, then restore so the next frame's
+            // copy sees the same resting layout.
+            transition_image(cmd, src, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+                             VK_ACCESS_SHADER_READ_BIT, VK_ACCESS_TRANSFER_READ_BIT,
+                             VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT);
+
+            // Verbatim copy (size-compatible UNORM ↔ SRGB) — no color-space
+            // conversion, unlike a blit. The runtime treats the swapchain
+            // format's encoding, so the raw bytes pass through unchanged.
+            VkImageCopy region{};
+            region.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+            region.srcSubresource.layerCount = 1;
+            region.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+            region.dstSubresource.layerCount = 1;
+            // 1:1 copy — add_layer guarantees source extent == per-eye swapchain size.
+            region.extent = { sw.width, sw.height, 1 };
+            vkCmdCopyImage(
+                cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &region);
+
+            transition_image(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL,
+                             VK_ACCESS_TRANSFER_READ_BIT, VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
+                             VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT);
+        }
+        else
+        {
+            // No image submitted this frame: clear so the runtime never
+            // composites stale swapchain contents.
+            VkClearColorValue clear{};
+            VkImageSubresourceRange range{};
+            range.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+            range.levelCount = 1;
+            range.layerCount = 1;
+            vkCmdClearColorImage(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear, 1, &range);
+        }
+
+        // OpenXR expects COLOR_ATTACHMENT_OPTIMAL at xrEndFrame.
+        transition_image(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
+                         VK_ACCESS_TRANSFER_WRITE_BIT, VK_ACCESS_COLOR_ATTACHMENT_READ_BIT,
+                         VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT);
+    }
+
+    if (!depth_layer_enabled_)
+    {
+        return;
+    }
+
+    // Per-eye depth into XR_KHR_composition_layer_depth. The layer's depth
+    // image is R32_SFLOAT (color) — CUDA can't interop a depth-format image —
+    // so we bridge through a staging buffer: image(R32_SFLOAT,color) -> buffer
+    // -> image(D32_SFLOAT,depth). The 32-bit float bits copy verbatim.
+    for (size_t i = 0; i < depth_swapchains_.size(); ++i)
+    {
+        const auto& sw = depth_swapchains_[i];
+        const VkImage dst = sw.images[sw.current_image_index];
+        const VkImage src = (i < views.size()) ? views[i].depth : VK_NULL_HANDLE;
+
+        transition_image_aspect(cmd, dst, VK_IMAGE_ASPECT_DEPTH_BIT, VK_IMAGE_LAYOUT_UNDEFINED,
+                                VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 0, VK_ACCESS_TRANSFER_WRITE_BIT,
+                                VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT);
+
+        if (src != VK_NULL_HANDLE && i < depth_staging_.size())
+        {
+            const VkBuffer staging = depth_staging_[i].buffer;
+
+            // R32_SFLOAT source rests in SHADER_READ_ONLY (COLOR aspect).
+            transition_image(cmd, src, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
+                             VK_ACCESS_SHADER_READ_BIT, VK_ACCESS_TRANSFER_READ_BIT,
+                             VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT);
+
+            // 1:1 copy — add_layer guarantees source extent == per-eye swapchain size.
+            VkBufferImageCopy to_buf{};
+            to_buf.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
+            to_buf.imageSubresource.layerCount = 1;
+            to_buf.imageExtent = { sw.width, sw.height, 1 };
+            vkCmdCopyImageToBuffer(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, staging, 1, &to_buf);
+
+            // Order the buffer write before the buffer read.
+            VkBufferMemoryBarrier bb{};
+            bb.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER;
+            bb.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
+            bb.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
+            bb.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
+            bb.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
+            bb.buffer = staging;
+            bb.offset = 0;
+            bb.size = VK_WHOLE_SIZE;
+            vkCmdPipelineBarrier(
+                cmd, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 1, &bb, 0, nullptr);
+
+            VkBufferImageCopy to_img{};
+            to_img.imageSubresource.aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT;
+            to_img.imageSubresource.layerCount = 1;
+            to_img.imageExtent = { sw.width, sw.height, 1 };
+            vkCmdCopyBufferToImage(cmd, staging, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &to_img);
+
+            transition_image(cmd, src, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL,
+                             VK_ACCESS_TRANSFER_READ_BIT, VK_ACCESS_SHADER_READ_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
+                             VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT);
+        }
+        else
+        {
+            // Clear to the far plane (1.0) so reprojection treats the empty
+            // frame as "nothing in front".
+            VkClearDepthStencilValue clear{ 1.0f, 0 };
+            VkImageSubresourceRange range{};
+            range.aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT;
+            range.levelCount = 1;
+            range.layerCount = 1;
+            vkCmdClearDepthStencilImage(cmd, dst, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, &clear, 1, &range);
+        }
+
+        // Match record_post_render_pass: depth swapchains are returned in
+        // DEPTH_STENCIL_ATTACHMENT_OPTIMAL before xrEndFrame.
+        transition_image_aspect(cmd, dst, VK_IMAGE_ASPECT_DEPTH_BIT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
+                                VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, VK_ACCESS_TRANSFER_WRITE_BIT, 0,
+                                VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT);
+    }
+}
+
+Resolution XrBackend::recommended_view_resolution() const
+{
+    if (session_)
+    {
+        const auto& views = session_->view_configuration_views();
+        if (!views.empty())
+        {
+            return Resolution{ views[0].recommendedImageRectWidth, views[0].recommendedImageRectHeight };
         }
     }
+    return current_extent();
 }
 
 void XrBackend::end_frame(const Frame& /*frame*/)
diff --git a/src/viz/session_tests/cpp/test_viz_session.cpp b/src/viz/session_tests/cpp/test_viz_session.cpp
index 037262ba4..adf3b3288 100644
--- a/src/viz/session_tests/cpp/test_viz_session.cpp
+++ b/src/viz/session_tests/cpp/test_viz_session.cpp
@@ -4,15 +4,24 @@
 // VizSession config + lifecycle tests.
 
 #include <catch2/catch_test_macros.hpp>
+#include <catch2/matchers/catch_matchers_string.hpp>
+#include <viz/core/vk_context.hpp>
+#include <viz/layers/projection_layer.hpp>
+#include <viz/layers/quad_layer.hpp>
 #include <viz/session/viz_session.hpp>
 #include <viz/test_support/test_helpers.hpp>
 
 #include <chrono>
 #include <stdexcept>
 
+using Catch::Matchers::ContainsSubstring;
 using viz::DisplayMode;
+using viz::ProjectionLayer;
+using viz::QuadLayer;
+using viz::Resolution;
 using viz::SessionState;
 using viz::VizSession;
+using viz::VkContext;
 using viz::testing::is_gpu_available;
 
 TEST_CASE("VizSession::create rejects zero window dimensions", "[unit][viz_session]")
@@ -75,3 +84,90 @@ TEST_CASE("VizSession::create kXr fails on hosts without an OpenXR runtime", "[u
     cfg_xr.mode = DisplayMode::kXr;
     CHECK_THROWS_AS(VizSession::create(cfg_xr), std::runtime_error);
 }
+
+// ── add_layer invariant / affinity rejection (failure-path) ──────────
+
+namespace
+{
+VizSession::Config offscreen_cfg(uint32_t side = 64)
+{
+    VizSession::Config cfg{};
+    cfg.mode = DisplayMode::kOffscreen;
+    cfg.window_width = side;
+    cfg.window_height = side;
+    return cfg;
+}
+} // namespace
+
+TEST_CASE("VizSession rejects a ProjectionLayer sized off the recommended resolution", "[gpu][viz_session]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    auto session = VizSession::create(offscreen_cfg());
+    REQUIRE(session != nullptr);
+
+    const Resolution rec = session->get_recommended_resolution();
+    ProjectionLayer::Config pcfg;
+    pcfg.view_resolution = { rec.width + 16, rec.height }; // deliberately wrong
+    // Built from the session's own context, so only the resolution check fires.
+    CHECK_THROWS_WITH(
+        session->add_layer<ProjectionLayer>(*session->get_vk_context(), pcfg), ContainsSubstring("view_resolution"));
+}
+
+TEST_CASE("VizSession enforces single ProjectionLayer XOR QuadLayers", "[gpu][viz_session]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    const Resolution rec{ 64, 64 }; // offscreen recommended == window extent
+
+    SECTION("ProjectionLayer rejected when a QuadLayer is already present")
+    {
+        auto session = VizSession::create(offscreen_cfg());
+        QuadLayer::Config qcfg;
+        qcfg.name = "quad";
+        qcfg.resolution = { 64, 64 };
+        REQUIRE(session->add_layer<QuadLayer>(*session->get_vk_context(), session->get_render_pass(), qcfg) != nullptr);
+
+        ProjectionLayer::Config pcfg;
+        pcfg.view_resolution = rec;
+        CHECK_THROWS_WITH(
+            session->add_layer<ProjectionLayer>(*session->get_vk_context(), pcfg), ContainsSubstring("only layer"));
+    }
+
+    SECTION("QuadLayer rejected when a ProjectionLayer is already present")
+    {
+        auto session = VizSession::create(offscreen_cfg());
+        ProjectionLayer::Config pcfg;
+        pcfg.view_resolution = rec;
+        REQUIRE(session->add_layer<ProjectionLayer>(*session->get_vk_context(), pcfg) != nullptr);
+
+        QuadLayer::Config qcfg;
+        qcfg.name = "quad";
+        qcfg.resolution = { 64, 64 };
+        CHECK_THROWS_WITH(session->add_layer<QuadLayer>(*session->get_vk_context(), session->get_render_pass(), qcfg),
+                          ContainsSubstring("alongside a ProjectionLayer"));
+    }
+}
+
+TEST_CASE("VizSession rejects a layer built from a foreign VkContext", "[gpu][viz_session]")
+{
+    if (!is_gpu_available())
+    {
+        SKIP("No Vulkan-capable GPU available");
+    }
+    auto session = VizSession::create(offscreen_cfg());
+    REQUIRE(session != nullptr);
+
+    // A second, independent context/device — its images + semaphores would be
+    // used on the session's queue, which is invalid cross-device usage.
+    VkContext foreign;
+    foreign.init({});
+
+    ProjectionLayer::Config pcfg;
+    pcfg.view_resolution = session->get_recommended_resolution(); // correct size, wrong context
+    CHECK_THROWS_WITH(session->add_layer<ProjectionLayer>(foreign, pcfg), ContainsSubstring("different VkContext"));
+}
diff --git a/src/viz/shaders/cpp/CMakeLists.txt b/src/viz/shaders/cpp/CMakeLists.txt
index 0f934aaf7..0d641b3f9 100644
--- a/src/viz/shaders/cpp/CMakeLists.txt
+++ b/src/viz/shaders/cpp/CMakeLists.txt
@@ -45,10 +45,12 @@ function(compile_shader GLSL_PATH VAR_NAME)
 endfunction()
 
 # Shader programs:
-#   textured_quad  — fullscreen quad sampling a combined image sampler.
-#                    Used by QuadLayer to display a CUDA-fed texture.
-compile_shader(textured_quad.vert  kTexturedQuadVertSpv)
-compile_shader(textured_quad.frag  kTexturedQuadFragSpv)
+#   textured_quad   — fullscreen / placed quad sampling a combined image
+#                     sampler. Used by QuadLayer.
+# (ProjectionLayer is direct-present-only — it copies its color/depth
+# images straight to the swapchains, so it owns no shaders.)
+compile_shader(textured_quad.vert            kTexturedQuadVertSpv)
+compile_shader(textured_quad.frag            kTexturedQuadFragSpv)
 
 # INTERFACE library exposing the generated headers + a phony custom
 # target that ensures the headers exist before any consumer compiles.