Я использую amazon EC2 с 16 GPU для вычислений. Когда я настроил все, что мне было нужно, и проверил это на питоне, произошло нечто странное.
Ниже приведены некоторые эксперименты:
import tensorflow as tf
import time
a=time.time()
hello=tf.constant('hello')
sess=tf.Session()
После вышесказанного я получил очень длинное сообщение:
2018-01-31 07:10:27.922290: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:10:27.922347: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:10:27.922360: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:10:27.922371: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:10:27.922381: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-01-31 07:11:05.263488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-31 07:11:05.265392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:0f.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2018-01-31 07:11:05.487461: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x56312fdf3970 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2018-01-31 07:11:05.488072: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-31 07:11:05.489826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:10.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2018-01-31 07:11:05.707955: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x56312fdf7e80 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2018-01-31 07:11:05.708452: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-31 07:11:05.709916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 2 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:11.0
Total memory: 11.17GiB
Free memory: 11.10GiB
Снова и снова......
Похоже, что tenorflow сканирует устройства GPU. Но это очень медленно. Я подождал 5 минут, чтобы увидеть выше, а потом он застрял до автоматического отключения от амазонки. Раньше, когда я делал то же самое на своем лабораторном сервере с 4 tesela k40, все шло хорошо.
Кто-нибудь знает, почему это произошло?