diff --git a/docs/source/_static/egocentric-input.gif b/docs/source/_static/egocentric-input.gif new file mode 100644 index 000000000..8e32fde92 --- /dev/null +++ b/docs/source/_static/egocentric-input.gif @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b6518c2c9cc650fcf52409f6970aef95de379ece383a8ddc2a3a298d280f1d0 +size 1464915 diff --git a/docs/source/_static/egocentric-reconstruction.gif b/docs/source/_static/egocentric-reconstruction.gif new file mode 100644 index 000000000..3122b6298 --- /dev/null +++ b/docs/source/_static/egocentric-reconstruction.gif @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:efca9b174082840be7b0aca9302b77730f4870c19b77a12b60db6ac0b3726663 +size 454670 diff --git a/docs/source/references/egocentric_hand_reconstruction.rst b/docs/source/references/egocentric_hand_reconstruction.rst index b7e5387e1..2577f3d4a 100644 --- a/docs/source/references/egocentric_hand_reconstruction.rst +++ b/docs/source/references/egocentric_hand_reconstruction.rst @@ -7,11 +7,26 @@ Egocentric Hand Reconstruction Automated pipeline for 4D hand and camera pose reconstruction from egocentric videos. Integrates ViPE and Dyn-HaMR in containerized environments. +.. list-table:: + :widths: 50 50 + + * - .. image:: ../_static/egocentric-input.gif + :alt: Source egocentric video + :width: 100% + :class: no-image-zoom + - .. image:: ../_static/egocentric-reconstruction.gif + :alt: Smooth fit grid reconstruction + :width: 100% + :class: no-image-zoom + * - .. centered:: Source egocentric video + - .. centered:: Reconstructed 4D hand and camera poses + + Video Capture --------------------------- To capture egocentric video with an OAK camera, see the -:doc:`/device/oak` documentation. +`OAK camera plugin `_ documentation. Setup ----- @@ -20,9 +35,42 @@ System Requirement ^^^^^^^^^^^^^^^^^^ - OS: Ubuntu 24.04 -- GPU: NVIDIA RTX 6000 Ada or L40 -- Memory: 100GB (for a reference 30s video, more for longer) -- Storage: 100GB +- GPU: NVIDIA RTX 6000 Ada, L40, H100, GeForce RTX 3090, GeForce RTX 4090 +- System RAM: 100GB (for a reference 30s video, more for longer) +- System VRAM: 12GB (for a reference 30s video, more for longer) +- Free Disk: 100GB + +Prerequisites +^^^^^^^^^^^^^ + +Ensure the following are installed and configured before starting: + +**Docker ≥ 20.10** (BuildKit support required): + +.. code-block:: bash + + docker --version # should print 20.10 or newer + +**NVIDIA Container Toolkit** — required for GPU access inside containers: + +- Install guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html + +**Python tooling** — required only for downloading videos from S3/Swift URLs: + +.. code-block:: bash + + pip install boto3 + + +Checkout the code +^^^^^^^^^^^^^^^^^ + +.. code-block:: bash + + git clone https://github.com/NVIDIA/IsaacTeleop.git + cd IsaacTeleop/src/postprocessing/egocentric_hand_reconstruction + +The ``./docker`` and ``./scripts`` directories referenced in this guide are located under this directory. Prepare data files ^^^^^^^^^^^^^^^^^^ @@ -32,10 +80,9 @@ Place required files in the ``outputs/`` directory. .. code-block:: text ... - ├── doc/ ├── docker/ ├── scripts/ - ├── ... + ├── osmo/ └── outputs/ ├── MANO_RIGHT.pkl └── BMC/ @@ -43,8 +90,8 @@ Place required files in the ``outputs/`` directory. **MANO model** (required): -- Download from: https://mano.is.tue.mpg.de/ -- Place: ``outputs/MANO_RIGHT.pkl`` +- Create an academic account at https://mano.is.tue.mpg.de/ and accept the license. +- The download is a ZIP archive — extract it and place ``MANO_RIGHT.pkl`` in ``outputs/``. **BMC data** (required): @@ -106,10 +153,11 @@ Run complete reconstruction (ViPE + Dyn-HaMR) with a single command: # Using a remote video file ./scripts/run_reconstruction.sh s3://path/to/your_video.mp4 -The script accepts either a **local file path** or a ``s3://`` **URL** -pointing to a video on a S3-compatible cloud storage. When a URL is provided, -the video is automatically downloaded to the ``outputs/`` directory before -processing begins. +The script accepts either a **local file path** or a remote **URL** +pointing to a video on cloud storage. Both ``s3://`` URLs (S3-compatible +cloud storage) and ``swift://`` URLs (OpenStack Object Storage) are +supported. When a URL is provided, the video is automatically downloaded +to the ``outputs/`` directory before processing begins. To use a remote video, set the following environment variables for credentials: @@ -148,6 +196,53 @@ The pipeline will: 3. Run Dyn-HaMR for hand reconstruction. 4. Save all results to ``outputs/logs/``. +Batch Reconstruction with OSMO +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For large-scale batch processing, the pipeline can be submitted as an +`OSMO `_ workflow using ``hand_reconstruction.yaml``. +This runs ViPE and Dyn-HaMR as two chained tasks on a GPU pool. + +**Prerequisites:** + +- A working OSMO cluster deployment (see the `OSMO deployment guide `_) +- OSMO CLI installed and authenticated (``osmo login …``) +- Bucket and image registry credentials stored in OSMO +- Container images built and pushed to your registry (see `Build Docker images`_) +- MANO and BMC assets available at an S3 URL + +See ``osmo/README.md`` for full setup details including credential registration and container image push steps. + +**Submit a workflow:** + +.. code-block:: bash + + osmo workflow submit osmo/hand_reconstruction.yaml \ + --pool POOL_NAME \ + --set-string \ + experiment_id=EXPERIMENT_ID \ + source_url=s3://INPUT_S3_PATH \ + dest_url=s3://OUTPUT_S3_PATH \ + assets_url=s3://ASSETS_S3_PATH \ + vipe_image=CONTAINER_REGISTRY/ego_vipe:TAG \ + dynhamr_image=CONTAINER_REGISTRY/ego_dynhamr:TAG + +**Monitor progress:** + +.. code-block:: bash + + osmo workflow logs WORKFLOW_ID -n 100 + +Estimated Runtime +^^^^^^^^^^^^^^^^^ + +For a reference 30-second video, expect approximately: + +- **ViPE**: ~7 minutes +- **Dyn-HaMR**: ~30 minutes + +Actual runtime may vary depending on system hardware and video length. + View results ^^^^^^^^^^^^ @@ -158,3 +253,8 @@ View results # View visualization vlc outputs/logs/video-custom//*/*_grid.mp4 + +Limitations +----------- + +The quality of the reconstructed result is directly related to the capture quality of the egocentric video. diff --git a/src/postprocessing/egocentric_hand_reconstruction/osmo/README.md b/src/postprocessing/egocentric_hand_reconstruction/osmo/README.md index deeaf9c8b..087f7e30e 100644 --- a/src/postprocessing/egocentric_hand_reconstruction/osmo/README.md +++ b/src/postprocessing/egocentric_hand_reconstruction/osmo/README.md @@ -23,11 +23,11 @@ The reconstruction pipeline requires two sets of external data files, stored in - **MANO_RIGHT.pkl** - **BMC/** -See [`doc/quickstart.md`](../doc/quickstart.md) for detailed setup instructions. +See [`Isaac Teleop Documentation`](https://nvidia.github.io/IsaacTeleop/main/references/egocentric_hand_reconstruction.html) for detailed setup instructions. ### Container images -The workflow requires two container images (`vipe_image` and `dynhamr_image`). Build them locally following the instructions in [`doc/quickstart.md`](../doc/quickstart.md): +The workflow requires two container images (`vipe_image` and `dynhamr_image`). Build them locally following the instructions in [`Isaac Teleop Documentation`](https://nvidia.github.io/IsaacTeleop/main/references/egocentric_hand_reconstruction.html): ```bash ./docker/vipe.sh build @@ -107,7 +107,6 @@ osmo credential set REGISTRY_CREDENTIAL \ See the [OSMO credentials documentation](https://nvidia.github.io/OSMO/main/user_guide/getting_started/credentials.html) for details. - ## Template Parameters The workflow uses `{{placeholder}}` template variables that are filled at submission time via `--set-string`: