LeRobot SO-ARM100 Train Guide

LeRobot SO-ARM100 Train Guide

HanFangchong |

1. Registration and Login

If you haven't registered a HUGGINGFACE_TOKEN, you need to visit and register/login at:
Hugging Face official website: https://huggingface.co/login?next=%2Fsettings%2Ftokens

Login Page

After successful registration and login:

Create a new dataset token

Create Token Step 1

Check all available options

Create Token Step 2

Click to create token

Create Token Step 3

Save your Token securely for future use

Save Token

2. Data Collection

Start collecting data:
For example, if my token is: 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
Replace ${HUGGINGFACE_TOKEN} below with your own token:
huggingface-cli login --token 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 --add-to-git-credential

huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential
Example: huggingface-cli login --token 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 --add-to-git-credential

Possible error: FileNotFoundError: [Errno 2] No such file or directory: 'git'

Data Collection Error

Solution:

sudo apt install git
Solution Result

After installation, re-execute the previous huggingface-cli command. Successful execution with red warnings is normal.

Execution Result

May get stuck at login page as shown below:

Login Hang Issue

This is due to network issues (Solution: Use VPN)
Store your Hugging Face repository name in a variable to run these commands:

HF_USER=$(huggingface-cli whoami | head -n 1)
echo $HF_USER
Solution Diagram

3. Data Collection Process

Preparation phase begins with prompt tone. First recording starts after tone, right click "→" to end recording. Rest phase begins (images will freeze), restore the object scene and press "→" to skip rest. Data compression and saving will begin automatically. After saving completes, second recording starts automatically.
Record 2 episodes and upload your dataset to the hub:

python lerobot/scripts/control_robot.py \
  --robot.type=so100 \
  --control.type=record \
  --control.fps=30 \
  --control.single_task="Grasp a lego block and put it in the bin." \
  --control.repo_id=${HF_USER}/so100_test \
  --control.tags='["so100","tutorial"]' \
  --control.warmup_time_s=5 \
  --control.episode_time_s=30 \
  --control.reset_time_s=30 \
  --control.num_episodes=2 \
  --control.push_to_hub=true
  
python lerobot/scripts/control_robot.py \
  --robot.type=so100 \
  --control.type=record \
  --control.fps=30 \
  --control.single_task="Grasp a lego block and put it in the bin." \
  --control.repo_id=hgmtest/so100_test \
  --control.tags='["so100","tutorial"]' \
  --control.warmup_time_s=5 \
  --control.episode_time_s=30 \
  --control.reset_time_s=30 \
  --control.num_episodes=2 \
  --control.push_to_hub=false
python lerobot/scripts/control_robot.py \ # Run LeRobot control script
--robot.type=so100 \ # Specify robot type as SO-100
--control.type=record \ # Operation mode: record
--control.fps=30 \ # Recording frame rate: 30fps
--control.single_task="Grasp a lego block and put it in the bin." \ # Task description: Grasp lego block and place in bin
--control.repo_id=${HF_USER}/so100_test \ # HuggingFace repository ID for dataset storage
--control.tags='["so100","tutorial"]' \ # Dataset tags
--control.warmup_time_s=5 \ # Preparation time before each recording (5s)
--control.episode_time_s=30 \ # Duration of each episode (30s)
--control.reset_time_s=30 \ # Robot position reset time (30s)
--control.num_episodes=2 \ # Record 2 episodes
--control.push_to_hub=true # Automatically upload to HuggingFace Hub after recording
python lerobot/scripts/control_robot.py \ # Run LeRobot control script
--robot.type=so100 \ # Specify robot type as SO-100
--control.type=record \ # Operation mode: record
--control.fps=30 \ # Recording frame rate: 30fps
--control.single_task="Grasp a lego block and put it in the bin." \ # Task description: Grasp lego block and place in bin
--control.repo_id=/home/vsu/so100_test \ # Local path for dataset storage
--control.tags='["so100","tutorial"]' \ # Dataset tags
--control.warmup_time_s=5 \ # Preparation time before each recording (5s)
--control.episode_time_s=30 \ # Duration of each episode (30s)
--control.reset_time_s=30 \ # Robot position reset time (30s)
--control.num_episodes=2 \ # Record 2 episodes
--control.push_to_hub=false # Do not upload to HuggingFace Hub after recording

Possible error:

Data Collection Error
subprocess.CalledProcessError: Command '['ffmpeg', '-f', 'image2', '-r', '30', '-i', '/home/skl/.cache/huggingface/lerobot/pdd46465/so100_test/images/observation.images.laptop/episode_000000/frame_%06d.png', '-vcodec', 'libsvtav1', '-pix_fmt', 'yuv420p', '-g', '2', '-crf', '30', '-loglevel', 'error', '-y', '/home/skl/.cache/huggingface/lerobot/pdd46465/so100_test/videos/chunk-000/observation.images.laptop/episode_000000.mp4']' returned non-zero exit status 1.

Solution:
When using miniconda, if you don't have ffmpeg in your environment:

conda install ffmpeg

Check software versions in your conda environment:

ffmpeg -codecs | grep 264
Error Solution 1

Locate the file and modify as shown:

nvidia-smi result

Re-running data recording may encounter error:

FileExistsError: [Errno 17] File exists: '/home/skl/.cache/huggingface/lerobot/pdd46465/so100_test'

Solution: Locate and delete the file

nvidia-smi result

After deletion, re-run the process.

Error Solution 2

As shown above, the dataset has been successfully uploaded. To add more data to an existing dataset, use the following command:

python lerobot/scripts/control_robot.py \ # Run LeRobot control script
--robot.type=so100 \ # Specify robot type as SO-100
--control.type=record \ # Operation mode: record
--control.fps=30 \ # Recording frame rate: 30fps
--control.resume=true \ # Continue recording
--control.single_task="Grasp a lego block and put it in the bin." \ # Task description
--control.repo_id=${HF_USER}/so100_test \ # HuggingFace repository ID
--control.tags='["so100","tutorial"]' \ # Dataset tags
--control.warmup_time_s=5 \ # Preparation time (5s)
--control.episode_time_s=30 \ # Episode duration (30s)
--control.reset_time_s=30 \ # Reset time (30s)
--control.num_episodes=2 \ # Record 2 episodes
--control.push_to_hub=true # Upload to HuggingFace Hub

To avoid uploading to huggingface hub, change --control.push_to_hub=true to --control.push_to_hub=false

python lerobot/scripts/control_robot.py \  
  --robot.type=so100 \                     
  --control.type=record \                  
  --control.fps=30 \                       
  --control.resume=true \                      
  --control.single_task="Grasp a lego block and put it in the bin." \ 
  --control.repo_id=${HF_USER}/so100_test \ 
  --control.tags='["so100","tutorial"]' \   
  --control.warmup_time_s=5 \               
  --control.episode_time_s=30 \            
  --control.reset_time_s=30 \              
  --control.num_episodes=2 \              
  --control.push_to_hub=true/false

Begin local dataset training with the following command:

python lerobot/scripts/train.py \
  --dataset.repo_id=${HF_USER}/so100_test \
  --policy.type=act \
  --output_dir=outputs/train/act_so100_test \
  --job_name=act_so100_test \
  --policy.device=cuda \
  --wandb.enable=false
  
python lerobot/scripts/train.py \
  --dataset.repo_id=hgmtest/so100_test \
  --policy.type=act \
  --output_dir=outputs/train/act_so100_test \
  --job_name=act_so100_test \
  --device=cuda \
  --wandb.enable=false
python lerobot/scripts/train.py \ # Run LeRobot training script
--dataset.repo_id=${HF_USER}/so100_test \ # Training dataset path (HuggingFace repo ID)
--policy.type=act \ # Use ACT(Attention-based Control Transformer) policy
--output_dir=outputs/train/act_so100_test \ # Training output directory
--job_name=act_so100_test \ # Training job name
--policy.device=cuda \ # Use CUDA (NVIDIA GPU) for training
--wandb.enable=true/false # Enable/disable Weights & Biases logging

The following images indicate training has started:

Training Start 1 Training Start 2

If encountering the following error (CUDA out of memory), it indicates insufficient GPU memory:

Insufficient Memory

Solution: Modify the train.py configuration file at lerobot_main/lerobot/config/, around line 54 as shown:

Memory Solution

num_workers: Affects data preloading parallelism
batch_size: Number of samples per training iteration (preferably even numbers like 2, 4, 16, 32 etc.)
Reducing these parameter values can resolve insufficient memory issues.

Leave a comment

Please note: comments must be approved before they are published.