紧接前文:
NVIDIA公司推出的GPU运行环境下的机器人仿真环境(NVIDIA Isaac Gym)的安装——强化学习的仿真训练环境
本文主要给出 NVIDIA Isaac Gym 在给出的pytorch下PPO算法下运行例子的运行命令例子:
下面就给出几个使用rlgpu文件下的reinforcement learning代码训练isaacgym环境的例子:
下面的例子使用的文件:/home/devil/isaacgym/python/rlgpu/train.py
rlgpu下面的train.py
使用help解释来查看NVIDIA给出的reinforcement leanring算法命令参数:
python train.py -h
RL Policy
optional arguments:
-h, --help show this help message and exit
--sim_device SIM_DEVICE
Physics Device in PyTorch-like syntax
--pipeline PIPELINE Tensor API pipeline (cpu/gpu)
--graphics_device_id GRAPHICS_DEVICE_ID
Graphics Device ID
--flex Use FleX for physics
--physx Use PhysX for physics
--num_threads NUM_THREADS
Number of cores used by PhysX
--subscenes SUBSCENES
Number of PhysX subscenes to simulate in parallel
--slices SLICES Number of client threads that process env slices
--test Run trained policy, no training
--play Run trained policy, the same as test, can be used only
by rl_games RL library
--resume RESUME Resume training or start testing from a checkpoint
--checkpoint CHECKPOINT
Path to the saved weights, only for rl_games RL
library
--headless Force display off at all times
--horovod Use horovod for multi-gpu training, have effect only
with rl_games RL library
--task TASK Can be BallBalance, Cartpole, CartpoleYUp, Ant,
Humanoid, Anymal, FrankaCabinet, Quadcopter,
ShadowHand, Ingenuity
--task_type TASK_TYPE
Choose Python or C++
--rl_device RL_DEVICE
Choose CPU or GPU device for inferencing policy
network
--logdir LOGDIR
--experiment EXPERIMENT
Experiment name. If used with --metadata flag an
additional information about physics engine, sim
device, pipeline and domain randomization will be
added to the name
--metadata Requires --experiment flag, adds physics engine, sim
device, pipeline info and if domain randomization is
used to the experiment name provided by user
--cfg_train CFG_TRAIN
--cfg_env CFG_ENV
--num_envs NUM_ENVS Number of environments to create - override config
file
--episode_length EPISODE_LENGTH
Episode length, by default is read from yaml config
--seed SEED Random seed
--max_iterations MAX_ITERATIONS
Set a maximum number of training iterations
--steps_num STEPS_NUM
Set number of simulation steps per 1 PPO iteration.
Supported only by rl_games. If not -1 overrides the
config settings.
--minibatch_size MINIBATCH_SIZE
Set batch size for PPO optimization step. Supported
only by rl_games. If not -1 overrides the config
settings.
--randomize Apply physics domain randomization
--torch_deterministic
Apply additional PyTorch settings for more
deterministic behaviour
运行命令例子:
1. CPU上仿真,CPU上训练
在CPU上运行仿真环境,同时PPO深度强化学习算法在CPU上进行训练 #Simulation on CPU, training on CPU:
python train.py --task=ShadowHand --headless --sim_device=cpu --rl_device=cpu --physx --num_threads=24
2. CPU上仿真,GPU上训练
python train.py --task=ShadowHand --headless --sim_device=cpu --rl_device=cuda:0 --physx --num_threads=24
3. GPU上仿真,CPU上训练
python train.py --task=ShadowHand --headless --sim_device=cuda:0 --rl_device=cpu --physx --num_threads=24
4. GPU上仿真,GPU上训练
其中,在0号显卡仿真,在1号显卡训练:
python train.py --task=ShadowHand --headless --sim_device=cuda:0 --rl_device=cuda:1 --physx --num_threads=24
其中,在1号显卡仿真,在0号显卡训练:
python train.py --task=ShadowHand --headless --sim_device=cuda:1 --rl_device=cuda:0 --physx --num_threads=24