This guide explains how to use theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/NVIDIA/Isaac-GR00T/llms.txt
Use this file to discover all available pages before exploring further.
Gr00tPolicy class to load and run inference with your trained model. After training, you’ll use this API to integrate your model with evaluation environments.
Loading the policy
Initialize a policy by providing the embodiment tag, model checkpoint path, and device:Parameters
model_path: Path to your trained model checkpoint directoryembodiment_tag: The embodiment tag you used during training (e.g.,EmbodimentTag.NEW_EMBODIMENT)device: Device to run inference on ("cuda:0","cpu", or integer device index)strict: Whether to validate inputs/outputs (recommended during development, can disable in production)
Understanding the observation format
The policy expects observations as a nested dictionary with three modalities:Dimensions
- B: Batch size (number of parallel environments)
- T: Temporal horizon (number of historical observations)
- H, W: Image height and width
- D: State dimension
- C: Number of channels (must be 3 for RGB)
Data type requirements
- Videos must be
np.uint8arrays with RGB pixel values in range [0, 255] - States must be
np.float32arrays - Language instructions are lists of lists of strings
The temporal horizon
T is determined by your model’s training configuration. Different modalities may have different temporal horizons (query via get_modality_config()).Understanding the action format
The policy returns actions in a similar nested structure:Dimensions
- B: Batch size (matches input batch size)
- T: Action horizon (number of future action steps to predict)
- D: Action dimension (e.g., 7 for arm joints, 1 for gripper)
Running inference
Use theget_action() method to compute actions from observations:
action: Dictionary of action arraysinfo: Dictionary of additional information (currently empty, reserved for future use)
Querying modality configurations
To understand what observations your policy expects and what actions it produces, query the modality configuration:- You’re unsure what observations your trained model expects
- You need to verify the temporal horizons for each modality
- You’re debugging observation/action format mismatches
Resetting the policy
Reset the policy between episodes:Currently, the policy is stateless, but calling
reset() is good practice for future compatibility.Adapting the policy to your environment
Most environments use different observation/action formats than the Policy API expects. You’ll typically need to write a policy wrapper that:- Transforms observations: Convert your environment’s observation format to the Policy API format
- Calls the policy: Use
policy.get_action()to compute actions - Transforms actions: Convert the policy’s actions back to your environment’s format
Example workflow
Server-client architecture for remote inference
For many use cases, especially when working with real robots or distributed systems, you may want to run the policy on a separate machine (e.g., a GPU server) and send observations/actions over the network.Why use server-client architecture?
- Separate compute resources: Run policy inference on a GPU server while controlling the robot from a different machine
- Dependency isolation: Avoid dependency issues with the client policy
Starting the policy server
Parameters
--embodiment-tag: The embodiment tag for your robot (e.g.,NEW_EMBODIMENT)--model-path: Path to your trained model checkpoint directory--device: Device to run inference on (cuda:0,cuda:1,cpu, etc.)--host: Host address (127.0.0.1for local only,0.0.0.0to accept external connections)--port: Port number (default: 5555)--strict: Enable input/output validation (default: True)
Using the policy client
On the client side, usePolicyClient to connect to the server:
PolicyClient implements the same BasePolicy interface, so it’s a drop-in replacement for Gr00tPolicy.
Common patterns
Batched inference
The policy supports batched inference for efficiency:Single environment inference
For single environments, use batch size of 1:Action chunking
When the action horizonT > 1, you can use action chunking:
Troubleshooting
- Enable strict mode during development:
strict=True - Print modality configs to understand expected formats
- Check shapes of your observations before calling
get_action() - Use the reference wrapper (
Gr00tSimPolicyWrapper) as a template - Validate incrementally: Test with dummy observations first before connecting to real environments