download sample codes from github
https://github.com/NVIDIA/DeepLearningExamples
the directory is /media/usb0/DeepLearningExamples
And download imagenet 1K data set to /media/usb0/imagenet/imagenet1k
build the container according to
https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Classification/RN50v1.5
soh@ubuntu:/media/usb0/DeepLearningExamples/TensorFlow/Classification/RN50v1.5$ pwd /media/usb0/DeepLearningExamples/TensorFlow/Classification/RN50v1.5 soh@ubuntu:/media/usb0/DeepLearningExamples/TensorFlow/Classification/RN50v1.5$ ./scripts/docker/interactive.sh
// which contains: docker run -it --rm --ipc=host --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v $PWD:/workspace/rn50v15_tf/ -v /media/usb0/imagenet/imagenet1k:/data rn50v15_tf bash
root@c400807f62ca:/workspace/rn50v15_tf# ./scripts/RN50_FP16_1GPU.sh . /data .
// first argument is path where main.py is. second argument is path to data, third argument is path to result // but error appears: ... RuntimeError: CUDA runtime API error cudaErrorInsufficientDriver (35): CUDA driver version is insufficient for CUDA runtime version
root@c400807f62ca:/workspace/rn50v15_tf# cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 430.26 Tue Jun 4 17:40:52 CDT 2019 GCC version: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
// find driver list fit to RTX 2070 Super in root@c400807f62ca:/workspace/rn50v15_tf# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Apr_24_19:10:27_PDT_2019 Cuda compilation tools, release 10.1, V10.1.168
https://www.nvidia.com/Download/Find.aspx
440.26, 430.50, 435.21, 435.17, 430.40, 430.34
// 430.26 is not in the list
// install 430.50 with .run file from nvidia site
// But neigther pip installation or Docker images shows any GPU memory reduction or training time reduction.
// install Tensorflow 1.15 from source
// then, the cnn42.py with "opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)" // shows GPU memory usage less than half of that without it.
// This is the first case when the TF-AMP shows some effect.
// The GPU utility is about 30% which is less than 42% without AMP. // The CPU utility is about 95% which is less than 140% without AMP. // But, the time for training is not reduced. Almost the same.
// install Tensorflow 2.0 from source
|