Skip to content

Commit

Permalink
update readme.md and tuxiv.conf.md
Browse files Browse the repository at this point in the history
The readme.md and tuxiv.conf.md have been modified according to version 4.0. Added Manage your environment section to readme.md. Added environment management command descriptions to tuxiv.conf.md.
  • Loading branch information
decsun committed Mar 11, 2022
1 parent ac65aeb commit 2b27629
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 35 deletions.
54 changes: 31 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,29 +49,29 @@ After tcloud is configured correctly, you can try to submit your first job.

1. Go to the example folder in your terminal.
2. Run `tcloud submit` command.
```
~/Dow/quickstart-master/example/helloworld ❯ tcloud submit
Start parsing tuxiv.conf...
building file list ...
8 files to consider
helloworld/
helloworld/run.sh
151 100% 0.00kB/s 0:00:00 (xfer#1, to-check=5/8)
helloworld/configurations/
helloworld/configurations/citynet.sh
12 100% 11.72kB/s 0:00:00 (xfer#2, to-check=2/8)
helloworld/configurations/conda.yaml
107 100% 104.49kB/s 0:00:00 (xfer#3, to-check=1/8)
helloworld/configurations/run.slurm
278 100% 271.48kB/s 0:00:00 (xfer#4, to-check=0/8)
sent 429 bytes received 144 bytes 382.00 bytes/sec
total size is 1071 speedup is 1.87
Submitted batch job 2000
Job helloworld submitted.
```

### Retriving Your Job Status and Output
```
~/Dow/quickstart-master/example/helloworld ❯ tcloud submit
Start parsing tuxiv.conf...
building file list ...
8 files to consider
helloworld/
helloworld/run.sh
151 100% 0.00kB/s 0:00:00 (xfer#1, to-check=5/8)
helloworld/configurations/
helloworld/configurations/citynet.sh
12 100% 11.72kB/s 0:00:00 (xfer#2, to-check=2/8)
helloworld/configurations/conda.yaml
107 100% 104.49kB/s 0:00:00 (xfer#3, to-check=1/8)
helloworld/configurations/run.slurm
278 100% 271.48kB/s 0:00:00 (xfer#4, to-check=0/8)
sent 429 bytes received 144 bytes 382.00 bytes/sec
total size is 1071 speedup is 1.87
Submitted batch job 2000
Job helloworld submitted.
```
### Retrive Your Job Status and Output
In this section, we provide two methods to monitor the job log.
After training, you can use `tcloud ls [filepath]` to find the output files
Expand All @@ -94,6 +94,14 @@ After training, you can use `tcloud ls [filepath]` to find the output files
tcloud download slurm_log/slurm-jobid.out
```
### Manage your environment
+ Reuse environment
We offer two methods to environmental management.
1. If you don't have the name section under the environment block in tuxiv.conf, tcloud will create a new environment for your new project.
2. You can add the environment name in `tuxiv.conf` to reuse an existing environment. [Detail about tuxiv.conf](tuxiv.conf.md)
## Demo video
The following videos will help you use tcloud CLI to begin your TACC journey: [demo video](https://hkustconnect-my.sharepoint.com/:v:/g/personal/dsunak_connect_ust_hk/EUYW3f8IRwVLhBtCYP_ufs4BpQ7CaxrCUBiUexY7-nLX7w?e=O2gR2G).
Expand Down
68 changes: 56 additions & 12 deletions tuxiv.conf.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,69 @@ There are four parts in `tuxiv.conf` that configure different parts of job submi

In this section, you can specify your software requirements, including the environment name, dependencies, source channels and so on. The tcloud CLI will help build your environment with *miniconda*.

~~~yaml
environment:
name: torch-env
dependencies:
- pytorch=1.6.0
- torchvision=0.7.0
channels: pytorch
~~~
Notice: The environment name is *optional*. You can have the following two options.
1. Environment name set.

In this case, tcloud will create a new environment when you change any of your dependencies configuration.
~~~yaml
environment:
dependencies:
- pytorch=1.6.0
- torchvision=0.7.0
channels: pytorch
~~~
2. Environment name unset.

In this case, the environment will be persistent and tcloud will be updated the environment when you change any of your dependencies configuration (instead of creat a new environment).
The environment configuration of tcloud is managed by conda, and you can follow conda to manage your environment.
~~~yaml
environment:
name: torch-env
dependencies:
- pytorch=1.6.0
- torchvision=0.7.0
channels: pytorch
~~~
+ Check environment

Check the existing environment with the `tcloud env ls` command.
```
~ ❯ tcloud env ls
# conda environments:
#
base * /mnt/home/username/.Miniconda3
pytorch /mnt/home/username/.Miniconda3/envs/pytorch
```
Check installed dependencies in a sp environment
the existing environment dependencies with the `tcloud env ls -n [ENV_NAME]` command.
```
~ ❯ tcloud env ls -n base
# packages in environment at /mnt/home/username/.Miniconda3:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
brotlipy 0.7.0 py38h27cfd23_1003
ca-certificates 2020.10.14 0
certifi 2020.6.20 pyhd3eb1b0_3
cffi 1.14.3 py38h261ae71_2
chardet 3.0.4 py38h06a4308_1003
conda 4.9.2 py38h06a4308_0
conda-package-handling 1.7.2 py38h03888b9_0
...
```

+ Job

In this section, you can specify your configurations for cluster resources, including number of nodes, CPUs, GPUs, output file and so on. All the cluster configuration should be set in the general part.

~~~yaml
job:
name: test
general:
- nodes=2
- output=${TACC_SLURM_USERLOG}/output.log
name: test
general:
- nodes=2 # the number of nodes
- ntasks-per-node=1 # the number of tasks per node
- cpus-per-task=10 # the number of cpu per task
- gres=gpu:2 # the number of gpu per node
~~~

**Note:** You can modify the output log path in Job section. For debugging purpose, we recommend you set the `output` value under `${TACC_USERDIR}` directory and check it using `tcloud ls` and `tcloud download`.
Expand Down

0 comments on commit 2b27629

Please sign in to comment.