Problem with Colab Pro running GPU

I am trying to train a Deep Neural Network model in Google Colab Pro. But it is giving very slow speed despite GPU activation. While training, following messages is displayed;
Pls help, I am workin in colab for the first time

2021-09-22 18:55:52.006 | INFO     | __main__:train_model:29 - Using single gpu trainner ...
WARNING:tensorflow:From /content/drive/My Drive/Maybeshewill_LaneNet-TF2/lanenet_data_feed_pipline.py:257: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
W0922 18:55:52.009715 1379 deprecation.py:345] From /content/drive/My Drive/Maybeshewill_LaneNet-TF2/lanenet_data_feed_pipline.py:257: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
2021-09-22 18:55:55.566020: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:55.573078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:55.573628: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.182075: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.182651: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.183203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.183665: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-09-22 18:55:56.183712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13818 MB memory:  -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
2021-09-22 18:55:56.504566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.505190: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.505671: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
W0922 18:55:56.943125 1379 warnings.py:110] /usr/local/lib/python3.7/dist-packages/keras/legacy_tf_layers/normalization.py:424: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
  '`tf.layers.batch_normalization` is deprecated and '

W0922 18:55:56.946539 1379 warnings.py:110] /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer_v1.py:1676: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
  warnings.warn('`layer.apply` is deprecated and '

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/keras/layers/normalization/batch_normalization.py:520: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W0922 18:55:56.962891 1379 deprecation.py:345] From /usr/local/lib/python3.7/dist-packages/keras/layers/normalization/batch_normalization.py:520: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/moving_averages.py:457: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0922 18:56:01.731532 1379 deprecation.py:345] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/moving_averages.py:457: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0922 18:56:04.113139 1379 warnings.py:110] /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_grad/Reshape_1:0", shape=(131072,), dtype=int32), values=Tensor("gradients/GatherV2_grad/Reshape:0", shape=(131072, 4), dtype=float32), dense_shape=Tensor("gradients/GatherV2_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)

2021-09-22 18:56:07.281 | INFO     | tusimple_lanenet_single_gpu_trainner:__init__:252 - Initialize tusimple lanenet trainner complete
2021-09-22 18:56:07.531483: W tensorflow/core/common_runtime/colocation_graph.cc:1145] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
IteratorGetNext: CPU GPU 
IteratorToStringHandle: CPU GPU 
TensorSliceDataset: CPU 
FlatMapDataset: CPU 
ShuffleDataset: CPU 
PrefetchDataset: CPU GPU 
ParallelMapDatasetV2: CPU 
RepeatDataset: CPU 
OneShotIterator: CPU 
BatchDatasetV2: CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  graph_input_node/input_tensor/TensorSliceDataset (TensorSliceDataset) /device:GPU:0
  graph_input_node/input_tensor/FlatMapDataset (FlatMapDataset) /device:GPU:0
  graph_input_node/input_tensor/ParallelMapDatasetV2 (ParallelMapDatasetV2) /device:GPU:0
  graph_input_node/input_tensor/ParallelMapDatasetV2_1 (ParallelMapDatasetV2) /device:GPU:0
  graph_input_node/input_tensor/ParallelMapDatasetV2_2 (ParallelMapDatasetV2) /device:GPU:0
  graph_input_node/input_tensor/ShuffleDataset (ShuffleDataset) /device:GPU:0
  graph_input_node/input_tensor/RepeatDataset (RepeatDataset) /device:GPU:0
  graph_input_node/input_tensor/BatchDatasetV2 (BatchDatasetV2) /device:GPU:0
  graph_input_node/input_tensor/PrefetchDataset (PrefetchDataset) /device:GPU:0
  graph_input_node/input_tensor/OneShotIterator (OneShotIterator) /device:GPU:0
  graph_input_node/input_tensor/IteratorToStringHandle (IteratorToStringHandle) /device:GPU:0
  graph_input_node/IteratorGetNext (IteratorGetNext) /device:GPU:0

2021-09-22 18:56:09.192 | INFO     | tusimple_lanenet_single_gpu_trainner:train:290 - => Starts to train LaneNet from scratch ...
  0% 0/385 [00:00<?, ?it/s]2021-09-22 18:56:11.592184: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'Const' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/Const}}
	 (OpKernel was found, but attributes didn't match) Requested Attributes: dtype=DT_STRING, value=Tensor<type: string shape: [] values: ./data/training/tfrecords/tusimple_train.tfrecords>, _device="/device:GPU:0"
	.  Registered:  device='XLA_CPU_JIT'; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_STRING]
  device='XLA_GPU_JIT'; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_STRING]
  device='XLA_GPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
  device='XLA_CPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
  device='DEFAULT'; dtype in [DT_VARIANT]
  device='DEFAULT'; dtype in [DT_BOOL]
  device='DEFAULT'; dtype in [DT_QUINT16]
  device='DEFAULT'; dtype in [DT_QINT16]
  device='DEFAULT'; dtype in [DT_QINT32]
  device='DEFAULT'; dtype in [DT_QUINT8]
  device='DEFAULT'; dtype in [DT_QINT8]
  device='DEFAULT'; dtype in [DT_COMPLEX128]
  device='DEFAULT'; dtype in [DT_COMPLEX64]
  device='DEFAULT'; dtype in [DT_INT8]
  device='DEFAULT'; dtype in [DT_UINT8]
  device='DEFAULT'; dtype in [DT_INT16]
  device='DEFAULT'; dtype in [DT_UINT16]
  device='DEFAULT'; dtype in [DT_UINT32]
  device='DEFAULT'; dtype in [DT_INT64]
  device='DEFAULT'; dtype in [DT_UINT64]
  device='DEFAULT'; dtype in [DT_DOUBLE]
  device='DEFAULT'; dtype in [DT_FLOAT]
  device='DEFAULT'; dtype in [DT_BFLOAT16]
  device='DEFAULT'; dtype in [DT_HALF]
  device='DEFAULT'; dtype in [DT_INT32]
  device='CPU'
  device='TPU_SYSTEM'
  device='GPU'; dtype in [DT_VARIANT]
  device='GPU'; dtype in [DT_BOOL]
  device='GPU'; dtype in [DT_COMPLEX128]
  device='GPU'; dtype in [DT_COMPLEX64]
  device='GPU'; dtype in [DT_UINT64]
  device='GPU'; dtype in [DT_INT64]
  device='GPU'; dtype in [DT_QINT32]
  device='GPU'; dtype in [DT_UINT32]
  device='GPU'; dtype in [DT_QUINT16]
  device='GPU'; dtype in [DT_QINT16]
  device='GPU'; dtype in [DT_INT16]
  device='GPU'; dtype in [DT_UINT16]
  device='GPU'; dtype in [DT_QINT8]
  device='GPU'; dtype in [DT_INT8]
  device='GPU'; dtype in [DT_UINT8]
  device='GPU'; dtype in [DT_DOUBLE]
  device='GPU'; dtype in [DT_FLOAT]
  device='GPU'; dtype in [DT_BFLOAT16]
  device='GPU'; dtype in [DT_HALF]

2021-09-22 18:56:11.593116: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'Reshape' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/flat_filenames}}
	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, Tshape=DT_INT32, _device="/device:GPU:0"
	.  Registered:  device='XLA_CPU_JIT'; Tshape in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
  device='XLA_GPU_JIT'; Tshape in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
  device='DEFAULT'; T in [DT_INT32]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_INT32]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_BOOL]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_BOOL]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_COMPLEX128]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_COMPLEX128]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_COMPLEX64]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_COMPLEX64]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_INT8]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_INT8]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_UINT8]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_UINT8]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_INT16]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_INT16]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_UINT16]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_UINT16]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_UINT32]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_UINT32]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_INT64]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_INT64]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_UINT64]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_UINT64]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_DOUBLE]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_DOUBLE]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_FLOAT]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_FLOAT]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_BFLOAT16]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_BFLOAT16]; Tshape in [DT_INT32]
  device='DEFAULT'; T in [DT_HALF]; Tshape in [DT_INT64]
  device='DEFAULT'; T in [DT_HALF]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_INT32]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_INT32]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_BOOL]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_BOOL]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_COMPLEX128]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_COMPLEX128]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_COMPLEX64]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_COMPLEX64]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_INT8]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_INT8]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_UINT8]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_UINT8]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_INT16]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_INT16]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_UINT16]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_UINT16]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_UINT32]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_UINT32]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_INT64]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_INT64]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_UINT64]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_UINT64]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_DOUBLE]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_DOUBLE]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_FLOAT]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_FLOAT]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_BFLOAT16]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_BFLOAT16]; Tshape in [DT_INT32]
  device='GPU'; T in [DT_HALF]; Tshape in [DT_INT64]
  device='GPU'; T in [DT_HALF]; Tshape in [DT_INT32]
  device='CPU'; Tshape in [DT_INT64]
  device='CPU'; Tshape in [DT_INT32]

2021-09-22 18:56:11.593488: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'TensorSliceDataset' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/TensorSliceDataset}}
	.  Registered:  device='CPU'

2021-09-22 18:56:11.593800: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'FlatMapDataset' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/FlatMapDataset}}
	.  Registered:  device='CPU'

2021-09-22 18:56:11.594089: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'ParallelMapDatasetV2' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/ParallelMapDatasetV2}}
	.  Registered:  device='CPU'

2021-09-22 18:56:11.594423: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'ParallelMapDatasetV2' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/ParallelMapDatasetV2_1}}
	.  Registered:  device='CPU'

2021-09-22 18:56:11.594709: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'ParallelMapDatasetV2' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/ParallelMapDatasetV2_2}}
	.  Registered:  device='CPU'

2021-09-22 18:56:11.595020: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'ShuffleDataset' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/ShuffleDataset}}
	.  Registered:  device='CPU'

2021-09-22 18:56:11.595300: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'RepeatDataset' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/RepeatDataset}}
	.  Registered:  device='CPU'

2021-09-22 18:56:11.595574: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'BatchDatasetV2' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/BatchDatasetV2}}
	.  Registered:  device='CPU'

2021-09-22 18:56:12.190431: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-09-22 18:56:16.157707: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005
train loss: 3.68222, b_loss: 0.53008, i_loss: 1.34292, miou: 0.47267: 100% 385/385 [01:39<00:00,  3.88it/s]
2021-09-22 18:57:48.407 | INFO     | tusimple_lanenet_single_gpu_trainner:train:369 - => Epoch: 1 Time: 2021-09-22 18:57:48 Train loss: 5.92847 Train miou: 0.41158 ...
train loss: 3.82123, b_loss: 0.43539, i_loss: 1.58457, miou: 0.49910:  76% 291/385 [01:06<00:21,  4.35it/s]

The code repo I am using, was originally built using Tensorflow 1.15.0. Colab pro currently has default Tensorflow 2.x. The moment I installed Tensorflow 1.x in it, its GPU speed slowed down (validated using standard test code). So I thought of changing the code files to TF 2.x using (.compat.v1). Now I havent installed TF 1.x in colab so test code is giving good GPU speed as compared to CPU. But training my code, is giving very slow speed per epoch. The notifications it is giving above is cus the code is changed from TF1 to TF2 using (.compat.v1). The author of this repo has asked to run it only in TF1 but since TF1 is isnt giving GPU speed I changed it. I don’t understand what the actual problem is. How can I use TF1 with GPU in colab ?

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.