read word

download sample codes from github

https://github.com/NVIDIA/DeepLearningExamples

the directory is /media/usb0/DeepLearningExamples

And download imagenet 1K data set to /media/usb0/imagenet/imagenet1k

build the container according to

https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Classification/RN50v1.5

soh@ubuntu:/media/usb0/DeepLearningExamples/TensorFlow/Classification/RN50v1.5$ pwd
/media/usb0/DeepLearningExamples/TensorFlow/Classification/RN50v1.5
soh@ubuntu:/media/usb0/DeepLearningExamples/TensorFlow/Classification/RN50v1.5$ ./scripts/docker/interactive.sh

// which contains:
docker run -it --rm --ipc=host --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v $PWD:/workspace/rn50v15_tf/ -v /media/usb0/imagenet/imagenet1k:/data rn50v15_tf bash

root@c400807f62ca:/workspace/rn50v15_tf# ./scripts/RN50_FP16_1GPU.sh . /data .

// first argument is path where main.py is. second argument is path to data, third argument is path to result
// but error appears:
...
RuntimeError: CUDA runtime API error cudaErrorInsufficientDriver (35):
CUDA driver version is insufficient for CUDA runtime version

root@c400807f62ca:/workspace/rn50v15_tf# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 430.26 Tue Jun 4 17:40:52 CDT 2019
GCC version: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)

// find driver list fit to RTX 2070 Super in
root@c400807f62ca:/workspace/rn50v15_tf# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168

https://www.nvidia.com/Download/Find.aspx

440.26, 430.50, 435.21, 435.17, 430.40, 430.34

// 430.26 is not in the list

// install 430.50 with .run file from nvidia site

// But neigther pip installation or Docker images shows any GPU memory reduction or training time reduction.

// install Tensorflow 1.15 from source

// then, the cnn42.py with "opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)"
// shows GPU memory usage less than half of that without it.

// This is the first case when the TF-AMP shows some effect.

// The GPU utility is about 30% which is less than 42% without AMP.
// The CPU utility is about 95% which is less than 140% without AMP.
// But, the time for training is not reduced. Almost the same.

// install Tensorflow 2.0 from source