Skip to content

simple ansible playbook to take clean ubuntu 18.04 to CUDA 10, PyTorch 1.0, fastai, miniconda heaven

Notifications You must be signed in to change notification settings

mbs0221-mlsys/ansible-ubu-to-pytorch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ansible script to convert clean Ubuntu 18.04 to CUDA 10, PyTorch 1.0, fastai, miniconda3 deep learning machine.

With this simple Ansible script, you’ll be able to convert a clean Ubuntu 18.04 image (as supplied by Google Compute Engine or PaperSpace) into a CUDA 10, PyTorch 1.0, fastai 1.0.x, miniconda3 powerhouse, ready to live the (mixed-precision!) deep learning dream.

After running the script, you’ll be able to ssh or mosh in, type conda activate pt, and be on your way!

[email protected] built this to scratch his own itch (mixed-precision neural networks on NVIDIA’s new TensorCores), but would be happy if you too find it useful. Issues and PRs might get looked at, or not. There’s also an accompanying blog post with screencast.

This script has now updated to using the official conda package of PyTorch 1.0.

Step by step instructions

Install ansible

pip3 install --user --upgrade ansible
# double-check that this gives you the Python 3 version
which ansible-playbook

Configure this script’s vars.yml and inventory.cfg

These instructions are reproduced from the comments at the start of deploy.yml:

  1. Register on the NVIDIA dev site and download the two CUDNN 7.4 or later DEBs.
    • These two debs should go in the downloads/ subdir of this repo.
    • Ensure that the names in vars.yml match your downloads.
    • Everything in downloads/ will be copied over to ~/Downloads. Use this to your advantage!
  2. The destination machine should have Ubuntu 18.04 installed. Importantly, ssh your_user@the_machine should let you in without password, and your_user should be able to sudo without entering a password. Test this!
  3. Edit vars.yml: Change user to your login and sudo user on the destination machine
  4. Edit inventory.cfg: Set the destination machine IP number / hostname under [app]

Run the script

Start the whole business:

ansible-playbook -i inventory.cfg deploy.yml

When I do this with a V100-equipped paperspace machine with 8 cores and 30GB of RAM, it takes about 13 minutes from start to finish.

After this, you will have to reboot the machine.

Use the environment

ssh in to the machine. Activate the environment with conda activate pt. Do what you normally do!

Updates

2018-12-16

Updated to official PyTorch 1.0 conda packages!

2018-12-06

Updated to latest 2018-12-06 build of PyTorch 1.0 preview. See below for update instructions.

2018-11-24

Updated to latest 2018-11-24 build of PyTorch 1.0 preview with the new magma 2.4.0 packages.

To update an existing install with the new PyTorch 1.0 build, you can either just re-run the whole deploy.yml playbook, or you can run just the miniconda3-related tasks like this:

ansible-playbook -i inventory.cfg deploy.yml --tags "miniconda3"

About

simple ansible playbook to take clean ubuntu 18.04 to CUDA 10, PyTorch 1.0, fastai, miniconda heaven

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published