Improve slurm droid

This commit is contained in:
Remi Cadene
2025-03-20 14:12:46 +00:00
parent 5d184a7811
commit 65738f0a80
5 changed files with 95 additions and 60 deletions

View File

@@ -39,14 +39,15 @@ python examples/port_datasets/droid_rlds/port.py \
## Port over SLURM
### 1. Port one shard per job
First, install slurm utilities from Hugging Face:
Install slurm utilities from Hugging Face:
```bash
pip install datatrove
```
Then run this script to start porting shards of the dataset:
### 1. Port one shard per job
Run this script to start porting shards of the dataset:
```bash
python examples/port_datasets/droid_rlds/slurm_port_shards.py \
--raw-dir /your/data/droid/1.0.1 \
@@ -83,7 +84,7 @@ Check if your jobs are running:
squeue -u $USER`
```
You should see a list with job indices like `15125385_155` where `15125385` is the job index and `155` is the worker index. The output/print of this worker is written in real time in `/your/logs/job_name/slurm_jobs/15125385_155.out`. For instance, you can inspect the content of this file by running `less /your/logs/job_name/slurm_jobs/15125385_155.out`.
You should see a list with job indices like `15125385_155` where `15125385` is the index of the run and `155` is the worker index. The output/print of this worker is written in real time in `/your/logs/job_name/slurm_jobs/15125385_155.out`. For instance, you can inspect the content of this file by running `less /your/logs/job_name/slurm_jobs/15125385_155.out`.
Check the progression of your jobs by running:
```bash