Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first pass of gpu smoke test #281

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions scripts/test-config-cuda.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/bin/bash
set -e

# set log location from command invokation
LOG_LOC=$1
TEST_FAIL=false

#!/bin/bash
set -e

# set log location from command invokation
LOG_LOC=$1
TEST_FAIL=false
RobJY marked this conversation as resolved.
Show resolved Hide resolved

# driver
PROC_DRIVER_FILE=/proc/driver/nvidia/version
if [ ! -f "$PROC_DRIVER_FILE" ]
then
echo "$PROC_DRIVER_FILE doesn't exist" | tee -a $LOG_LOC
echo "WARNING: CUDA driver may not be correctly installed." | tee -a $LOG_LOC
TEST_FAIL=true
else
# 2 possible command line options
# 1) we could parse /proc/driver/nvidia/version, but output isn't easy to parse:
# NVRM version: NVIDIA UNIX x86_64 Kernel Module 470.74 Mon Sep 13 23:09:15 UTC 2021
# GCC version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
# 2) nvidia-smi --query-gpu=driver_version --format=csv
# output is easy to parse:
# driver_version
# 470.74
# but nvidia-smi may require communication with the card that we won't have.
# testing will be needed
while read line; do
IFS=' ' read -ra tmp_array <<< $line
if [ ${tmp_array[0]} = "NVRM" ] && [ ${tmp_array[1]} = "version:" ]
then
VERSION_DRIVER=${tmp_array[7]}
fi
done < $PROC_DRIVER_FILE
fi

echo $VERSION_DRIVER

# toolkit
if ! TOOLKIT_CHECK_OUTPUT=$(nvcc -V 2>&1);
then
echo "Failed to run 'nvcc -V' with error message: $TOOLKIT_CHECK_OUTPUT" | tee -a $LOG_LOC
echo "WARNING: CUDA toolkit may not be correctly installed." | tee -a $LOG_LOC
TEST_FAIL=true
else
# parse output to get version number
while IFS= read -r line
do
IFS=' ' read -ra tmp_array <<< $line
if [ "${tmp_array[3]}" = "release" ]
then
VERSION_TOOLKIT=${tmp_array[5]}
fi
done <<< $TOOLKIT_CHECK_OUTPUT
fi

echo $VERSION_TOOLKIT

# tensorflow
if ! VERSION_TF_OUTPUT=`python -c 'import tensorflow as tf; print(tf.__version__)' 2>&1`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(...) may be preferable to `...`. https://github.com/koalaman/shellcheck/wiki/SC2006

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eitsupi! I made the change as you suggested in my latest commit. I didn't realize backticks were deprecated. I'll try to use parentheses going forward.

then
echo "Error: trying to get tensorflow version: $TF_VERSION"
else
while IFS= read -r line
do
VERSION_TF=$line
done <<< $VERSION_TF_OUTPUT
fi

echo $VERSION_TF

if [ "$TEST_FAIL" = true ]
then
echo "WARNING: at least one of the GPU functionality tests has failed." | tee -a $LOG_LOC
echo "Please run rocker-versioned2/tests/gpu/test-gpu.sh script for more detailed information." | tee -a $LOG_LOC
fi
36 changes: 36 additions & 0 deletions tests/gpu/misc/examples_tf.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@

## Tensorflow:
install.packages('keras', repos='http://cran.us.r-project.org')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to use a US CRAN mirror?

CRAN=${CRAN:-https://cran.r-project.org}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RobJY is this to trigger source installation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we need this line at all since tensorflow is installed in the containers by scripts/install_tensorflow.sh?
When I attempt to run exampls_tf.R without it though I get the following error:

Error in library(keras) : there is no package called ‘keras’
Execution halted

Running library(tensorflow) in R gives a similar error message.

Running scripts/test-config-cuda.sh reports the correct tensorflow version, but it's checking the version from Python with python -c 'import tensorflow as tf; print(tf.__version__)'.

Do I need to add the path where tensorflow is installed by scripts/install_tensorflow.sh somewhere so R sees it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wanted to point out is that the repos argument may simply be unnecessary.

Copy link
Contributor Author

@RobJY RobJY Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @eitsupi, you're right. It worked fine when I removed repos. Thanks! I've committed that change.

It still seems like it shouldn't need to install that though. I tried adding the path /opt/venv/reticulate with the following code, but I got the same error:

old_path <- Sys.getenv("PATH")
Sys.setenv(PATH = paste(old_path, "/opt/venv/reticulate", sep = ":"))

Is it fine to install keras here or does the fact that I need to install it indicate that there's an issue with the tensorflow install?

library(keras)
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
# reshape
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# rescale
x_train <- x_train / 255
x_test <- x_test / 255
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 10, activation = 'softmax')

model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
history <- model %>% fit(
x_train, y_train,
epochs = 30, batch_size = 128,
validation_split = 0.2
)
model %>% evaluate(x_test, y_test)
1 change: 1 addition & 0 deletions tests/gpu/misc/nvblas.R
11 changes: 11 additions & 0 deletions tests/ml/nvblas.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
install.packages("callr")


callr::r(function(){
system.time({
N <- 2^14
M <- matrix(rnorm(N*N), nrow=N, ncol=N)
M %*% M
})
}, env = c(LD_PRELOAD="libnvblas.so")
)