Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/source/_static/egocentric-input.gif
Comment thread
shaosu-nvidia marked this conversation as resolved.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/source/_static/egocentric-reconstruction.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
124 changes: 112 additions & 12 deletions docs/source/references/egocentric_hand_reconstruction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,26 @@ Egocentric Hand Reconstruction
Automated pipeline for 4D hand and camera pose reconstruction from egocentric
videos. Integrates ViPE and Dyn-HaMR in containerized environments.

.. list-table::
:widths: 50 50

* - .. image:: ../_static/egocentric-input.gif
:alt: Source egocentric video
:width: 100%
:class: no-image-zoom
- .. image:: ../_static/egocentric-reconstruction.gif
:alt: Smooth fit grid reconstruction
:width: 100%
:class: no-image-zoom
* - .. centered:: Source egocentric video
- .. centered:: Reconstructed 4D hand and camera poses


Video Capture
---------------------------

To capture egocentric video with an OAK camera, see the
:doc:`/device/oak` documentation.
`OAK camera plugin <https://nvidia.github.io/IsaacTeleop/main/device/oak.html>`_ documentation.

Setup
-----
Expand All @@ -20,9 +35,42 @@ System Requirement
^^^^^^^^^^^^^^^^^^

- OS: Ubuntu 24.04
- GPU: NVIDIA RTX 6000 Ada or L40
- Memory: 100GB (for a reference 30s video, more for longer)
- Storage: 100GB
- GPU: NVIDIA RTX 6000 Ada, L40, H100, GeForce RTX 3090, GeForce RTX 4090
- System RAM: 100GB (for a reference 30s video, more for longer)
- System VRAM: 12GB (for a reference 30s video, more for longer)
- Free Disk: 100GB

Prerequisites
^^^^^^^^^^^^^

Ensure the following are installed and configured before starting:

**Docker ≥ 20.10** (BuildKit support required):

.. code-block:: bash

docker --version # should print 20.10 or newer

**NVIDIA Container Toolkit** — required for GPU access inside containers:

- Install guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

**Python tooling** — required only for downloading videos from S3/Swift URLs:

.. code-block:: bash

pip install boto3


Checkout the code
^^^^^^^^^^^^^^^^^

.. code-block:: bash

git clone https://github.com/NVIDIA/IsaacTeleop.git
cd IsaacTeleop/src/postprocessing/egocentric_hand_reconstruction

The ``./docker`` and ``./scripts`` directories referenced in this guide are located under this directory.

Prepare data files
^^^^^^^^^^^^^^^^^^
Expand All @@ -32,19 +80,18 @@ Place required files in the ``outputs/`` directory.
.. code-block:: text

...
├── doc/
├── docker/
├── scripts/
├── ...
├── osmo/
└── outputs/
├── MANO_RIGHT.pkl
└── BMC/
└── *.npy

**MANO model** (required):

- Download from: https://mano.is.tue.mpg.de/
- Place: ``outputs/MANO_RIGHT.pkl``
- Create an academic account at https://mano.is.tue.mpg.de/ and accept the license.
- The download is a ZIP archive — extract it and place ``MANO_RIGHT.pkl`` in ``outputs/``.

**BMC data** (required):

Expand Down Expand Up @@ -106,10 +153,11 @@ Run complete reconstruction (ViPE + Dyn-HaMR) with a single command:
# Using a remote video file
./scripts/run_reconstruction.sh s3://path/to/your_video.mp4

The script accepts either a **local file path** or a ``s3://`` **URL**
pointing to a video on a S3-compatible cloud storage. When a URL is provided,
the video is automatically downloaded to the ``outputs/`` directory before
processing begins.
The script accepts either a **local file path** or a remote **URL**
pointing to a video on cloud storage. Both ``s3://`` URLs (S3-compatible
cloud storage) and ``swift://`` URLs (OpenStack Object Storage) are
supported. When a URL is provided, the video is automatically downloaded
to the ``outputs/`` directory before processing begins.

To use a remote video, set the following environment variables for
credentials:
Expand Down Expand Up @@ -148,6 +196,53 @@ The pipeline will:
3. Run Dyn-HaMR for hand reconstruction.
4. Save all results to ``outputs/logs/``.

Batch Reconstruction with OSMO
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For large-scale batch processing, the pipeline can be submitted as an
`OSMO <https://github.com/NVIDIA/OSMO>`_ workflow using ``hand_reconstruction.yaml``.
This runs ViPE and Dyn-HaMR as two chained tasks on a GPU pool.

**Prerequisites:**

- A working OSMO cluster deployment (see the `OSMO deployment guide <https://nvidia.github.io/OSMO/main/deployment_guide/getting_started/infrastructure_setup.html>`_)
- OSMO CLI installed and authenticated (``osmo login …``)
- Bucket and image registry credentials stored in OSMO
- Container images built and pushed to your registry (see `Build Docker images`_)
- MANO and BMC assets available at an S3 URL

See ``osmo/README.md`` for full setup details including credential registration and container image push steps.

**Submit a workflow:**

.. code-block:: bash

osmo workflow submit osmo/hand_reconstruction.yaml \
--pool POOL_NAME \
--set-string \
experiment_id=EXPERIMENT_ID \
source_url=s3://INPUT_S3_PATH \
dest_url=s3://OUTPUT_S3_PATH \
assets_url=s3://ASSETS_S3_PATH \
vipe_image=CONTAINER_REGISTRY/ego_vipe:TAG \
dynhamr_image=CONTAINER_REGISTRY/ego_dynhamr:TAG

**Monitor progress:**

.. code-block:: bash

osmo workflow logs WORKFLOW_ID -n 100

Estimated Runtime
^^^^^^^^^^^^^^^^^

For a reference 30-second video, expect approximately:

- **ViPE**: ~7 minutes
- **Dyn-HaMR**: ~30 minutes

Actual runtime may vary depending on system hardware and video length.

View results
^^^^^^^^^^^^

Expand All @@ -158,3 +253,8 @@ View results

# View visualization
vlc outputs/logs/video-custom/<DATE>/<VIDEO_NAME>*/*_grid.mp4

Limitations
-----------

The quality of the reconstructed result is directly related to the capture quality of the egocentric video.
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ The reconstruction pipeline requires two sets of external data files, stored in
- **MANO_RIGHT.pkl**
- **BMC/**

See [`doc/quickstart.md`](../doc/quickstart.md) for detailed setup instructions.
See [`Isaac Teleop Documentation`](https://nvidia.github.io/IsaacTeleop/main/references/egocentric_hand_reconstruction.html) for detailed setup instructions.

### Container images

The workflow requires two container images (`vipe_image` and `dynhamr_image`). Build them locally following the instructions in [`doc/quickstart.md`](../doc/quickstart.md):
The workflow requires two container images (`vipe_image` and `dynhamr_image`). Build them locally following the instructions in [`Isaac Teleop Documentation`](https://nvidia.github.io/IsaacTeleop/main/references/egocentric_hand_reconstruction.html):

```bash
./docker/vipe.sh build
Expand Down Expand Up @@ -107,7 +107,6 @@ osmo credential set REGISTRY_CREDENTIAL \

See the [OSMO credentials documentation](https://nvidia.github.io/OSMO/main/user_guide/getting_started/credentials.html) for details.


## Template Parameters

The workflow uses `{{placeholder}}` template variables that are filled at submission time via `--set-string`:
Expand Down
Loading