I am trying to train a Deep Neural Network model in Google Colab Pro. But it is giving very slow speed despite GPU activation. While training, following messages is displayed;

Pls help, I am workin in colab for the first time

```
2021-09-22 18:55:52.006 | INFO | __main__:train_model:29 - Using single gpu trainner ...
WARNING:tensorflow:From /content/drive/My Drive/Maybeshewill_LaneNet-TF2/lanenet_data_feed_pipline.py:257: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0922 18:55:52.009715 1379 deprecation.py:345] From /content/drive/My Drive/Maybeshewill_LaneNet-TF2/lanenet_data_feed_pipline.py:257: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2021-09-22 18:55:55.566020: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:55.573078: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:55.573628: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.182075: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.182651: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.183203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.183665: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-09-22 18:55:56.183712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13818 MB memory: -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
2021-09-22 18:55:56.504566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.505190: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-22 18:55:56.505671: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
W0922 18:55:56.943125 1379 warnings.py:110] /usr/local/lib/python3.7/dist-packages/keras/legacy_tf_layers/normalization.py:424: UserWarning: `tf.layers.batch_normalization` is deprecated and will be removed in a future version. Please use `tf.keras.layers.BatchNormalization` instead. In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.BatchNormalization` documentation).
'`tf.layers.batch_normalization` is deprecated and '
W0922 18:55:56.946539 1379 warnings.py:110] /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer_v1.py:1676: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
warnings.warn('`layer.apply` is deprecated and '
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/keras/layers/normalization/batch_normalization.py:520: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W0922 18:55:56.962891 1379 deprecation.py:345] From /usr/local/lib/python3.7/dist-packages/keras/layers/normalization/batch_normalization.py:520: _colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/moving_averages.py:457: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0922 18:56:01.731532 1379 deprecation.py:345] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/moving_averages.py:457: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0922 18:56:04.113139 1379 warnings.py:110] /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_grad/Reshape_1:0", shape=(131072,), dtype=int32), values=Tensor("gradients/GatherV2_grad/Reshape:0", shape=(131072, 4), dtype=float32), dense_shape=Tensor("gradients/GatherV2_grad/Cast:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
"shape. This may consume a large amount of memory." % value)
2021-09-22 18:56:07.281 | INFO | tusimple_lanenet_single_gpu_trainner:__init__:252 - Initialize tusimple lanenet trainner complete
2021-09-22 18:56:07.531483: W tensorflow/core/common_runtime/colocation_graph.cc:1145] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
IteratorGetNext: CPU GPU
IteratorToStringHandle: CPU GPU
TensorSliceDataset: CPU
FlatMapDataset: CPU
ShuffleDataset: CPU
PrefetchDataset: CPU GPU
ParallelMapDatasetV2: CPU
RepeatDataset: CPU
OneShotIterator: CPU
BatchDatasetV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
graph_input_node/input_tensor/TensorSliceDataset (TensorSliceDataset) /device:GPU:0
graph_input_node/input_tensor/FlatMapDataset (FlatMapDataset) /device:GPU:0
graph_input_node/input_tensor/ParallelMapDatasetV2 (ParallelMapDatasetV2) /device:GPU:0
graph_input_node/input_tensor/ParallelMapDatasetV2_1 (ParallelMapDatasetV2) /device:GPU:0
graph_input_node/input_tensor/ParallelMapDatasetV2_2 (ParallelMapDatasetV2) /device:GPU:0
graph_input_node/input_tensor/ShuffleDataset (ShuffleDataset) /device:GPU:0
graph_input_node/input_tensor/RepeatDataset (RepeatDataset) /device:GPU:0
graph_input_node/input_tensor/BatchDatasetV2 (BatchDatasetV2) /device:GPU:0
graph_input_node/input_tensor/PrefetchDataset (PrefetchDataset) /device:GPU:0
graph_input_node/input_tensor/OneShotIterator (OneShotIterator) /device:GPU:0
graph_input_node/input_tensor/IteratorToStringHandle (IteratorToStringHandle) /device:GPU:0
graph_input_node/IteratorGetNext (IteratorGetNext) /device:GPU:0
2021-09-22 18:56:09.192 | INFO | tusimple_lanenet_single_gpu_trainner:train:290 - => Starts to train LaneNet from scratch ...
0% 0/385 [00:00<?, ?it/s]2021-09-22 18:56:11.592184: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'Const' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/Const}}
(OpKernel was found, but attributes didn't match) Requested Attributes: dtype=DT_STRING, value=Tensor<type: string shape: [] values: ./data/training/tfrecords/tusimple_train.tfrecords>, _device="/device:GPU:0"
. Registered: device='XLA_CPU_JIT'; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_STRING]
device='XLA_GPU_JIT'; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_STRING]
device='XLA_GPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
device='XLA_CPU'; dtype in [DT_UINT8, DT_QUINT8, DT_UINT16, DT_INT8, DT_QINT8, DT_INT16, DT_INT32, DT_QINT32, DT_INT64, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128, DT_BOOL, DT_BFLOAT16]
device='DEFAULT'; dtype in [DT_VARIANT]
device='DEFAULT'; dtype in [DT_BOOL]
device='DEFAULT'; dtype in [DT_QUINT16]
device='DEFAULT'; dtype in [DT_QINT16]
device='DEFAULT'; dtype in [DT_QINT32]
device='DEFAULT'; dtype in [DT_QUINT8]
device='DEFAULT'; dtype in [DT_QINT8]
device='DEFAULT'; dtype in [DT_COMPLEX128]
device='DEFAULT'; dtype in [DT_COMPLEX64]
device='DEFAULT'; dtype in [DT_INT8]
device='DEFAULT'; dtype in [DT_UINT8]
device='DEFAULT'; dtype in [DT_INT16]
device='DEFAULT'; dtype in [DT_UINT16]
device='DEFAULT'; dtype in [DT_UINT32]
device='DEFAULT'; dtype in [DT_INT64]
device='DEFAULT'; dtype in [DT_UINT64]
device='DEFAULT'; dtype in [DT_DOUBLE]
device='DEFAULT'; dtype in [DT_FLOAT]
device='DEFAULT'; dtype in [DT_BFLOAT16]
device='DEFAULT'; dtype in [DT_HALF]
device='DEFAULT'; dtype in [DT_INT32]
device='CPU'
device='TPU_SYSTEM'
device='GPU'; dtype in [DT_VARIANT]
device='GPU'; dtype in [DT_BOOL]
device='GPU'; dtype in [DT_COMPLEX128]
device='GPU'; dtype in [DT_COMPLEX64]
device='GPU'; dtype in [DT_UINT64]
device='GPU'; dtype in [DT_INT64]
device='GPU'; dtype in [DT_QINT32]
device='GPU'; dtype in [DT_UINT32]
device='GPU'; dtype in [DT_QUINT16]
device='GPU'; dtype in [DT_QINT16]
device='GPU'; dtype in [DT_INT16]
device='GPU'; dtype in [DT_UINT16]
device='GPU'; dtype in [DT_QINT8]
device='GPU'; dtype in [DT_INT8]
device='GPU'; dtype in [DT_UINT8]
device='GPU'; dtype in [DT_DOUBLE]
device='GPU'; dtype in [DT_FLOAT]
device='GPU'; dtype in [DT_BFLOAT16]
device='GPU'; dtype in [DT_HALF]
2021-09-22 18:56:11.593116: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'Reshape' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/flat_filenames}}
(OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, Tshape=DT_INT32, _device="/device:GPU:0"
. Registered: device='XLA_CPU_JIT'; Tshape in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
device='XLA_GPU_JIT'; Tshape in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
device='DEFAULT'; T in [DT_INT32]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_INT32]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_BOOL]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_BOOL]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_COMPLEX128]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_COMPLEX128]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_COMPLEX64]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_COMPLEX64]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_INT8]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_INT8]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_UINT8]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_UINT8]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_INT16]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_INT16]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_UINT16]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_UINT16]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_UINT32]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_UINT32]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_INT64]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_INT64]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_UINT64]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_UINT64]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_DOUBLE]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_DOUBLE]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_FLOAT]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_FLOAT]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_BFLOAT16]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_BFLOAT16]; Tshape in [DT_INT32]
device='DEFAULT'; T in [DT_HALF]; Tshape in [DT_INT64]
device='DEFAULT'; T in [DT_HALF]; Tshape in [DT_INT32]
device='GPU'; T in [DT_INT32]; Tshape in [DT_INT64]
device='GPU'; T in [DT_INT32]; Tshape in [DT_INT32]
device='GPU'; T in [DT_BOOL]; Tshape in [DT_INT64]
device='GPU'; T in [DT_BOOL]; Tshape in [DT_INT32]
device='GPU'; T in [DT_COMPLEX128]; Tshape in [DT_INT64]
device='GPU'; T in [DT_COMPLEX128]; Tshape in [DT_INT32]
device='GPU'; T in [DT_COMPLEX64]; Tshape in [DT_INT64]
device='GPU'; T in [DT_COMPLEX64]; Tshape in [DT_INT32]
device='GPU'; T in [DT_INT8]; Tshape in [DT_INT64]
device='GPU'; T in [DT_INT8]; Tshape in [DT_INT32]
device='GPU'; T in [DT_UINT8]; Tshape in [DT_INT64]
device='GPU'; T in [DT_UINT8]; Tshape in [DT_INT32]
device='GPU'; T in [DT_INT16]; Tshape in [DT_INT64]
device='GPU'; T in [DT_INT16]; Tshape in [DT_INT32]
device='GPU'; T in [DT_UINT16]; Tshape in [DT_INT64]
device='GPU'; T in [DT_UINT16]; Tshape in [DT_INT32]
device='GPU'; T in [DT_UINT32]; Tshape in [DT_INT64]
device='GPU'; T in [DT_UINT32]; Tshape in [DT_INT32]
device='GPU'; T in [DT_INT64]; Tshape in [DT_INT64]
device='GPU'; T in [DT_INT64]; Tshape in [DT_INT32]
device='GPU'; T in [DT_UINT64]; Tshape in [DT_INT64]
device='GPU'; T in [DT_UINT64]; Tshape in [DT_INT32]
device='GPU'; T in [DT_DOUBLE]; Tshape in [DT_INT64]
device='GPU'; T in [DT_DOUBLE]; Tshape in [DT_INT32]
device='GPU'; T in [DT_FLOAT]; Tshape in [DT_INT64]
device='GPU'; T in [DT_FLOAT]; Tshape in [DT_INT32]
device='GPU'; T in [DT_BFLOAT16]; Tshape in [DT_INT64]
device='GPU'; T in [DT_BFLOAT16]; Tshape in [DT_INT32]
device='GPU'; T in [DT_HALF]; Tshape in [DT_INT64]
device='GPU'; T in [DT_HALF]; Tshape in [DT_INT32]
device='CPU'; Tshape in [DT_INT64]
device='CPU'; Tshape in [DT_INT32]
2021-09-22 18:56:11.593488: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'TensorSliceDataset' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/TensorSliceDataset}}
. Registered: device='CPU'
2021-09-22 18:56:11.593800: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'FlatMapDataset' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/FlatMapDataset}}
. Registered: device='CPU'
2021-09-22 18:56:11.594089: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'ParallelMapDatasetV2' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/ParallelMapDatasetV2}}
. Registered: device='CPU'
2021-09-22 18:56:11.594423: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'ParallelMapDatasetV2' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/ParallelMapDatasetV2_1}}
. Registered: device='CPU'
2021-09-22 18:56:11.594709: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'ParallelMapDatasetV2' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/ParallelMapDatasetV2_2}}
. Registered: device='CPU'
2021-09-22 18:56:11.595020: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'ShuffleDataset' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/ShuffleDataset}}
. Registered: device='CPU'
2021-09-22 18:56:11.595300: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'RepeatDataset' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/RepeatDataset}}
. Registered: device='CPU'
2021-09-22 18:56:11.595574: W tensorflow/core/grappler/utils/graph_view.cc:836] No registered 'BatchDatasetV2' OpKernel for GPU devices compatible with node {{node graph_input_node/input_tensor/BatchDatasetV2}}
. Registered: device='CPU'
2021-09-22 18:56:12.190431: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-09-22 18:56:16.157707: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005
train loss: 3.68222, b_loss: 0.53008, i_loss: 1.34292, miou: 0.47267: 100% 385/385 [01:39<00:00, 3.88it/s]
2021-09-22 18:57:48.407 | INFO | tusimple_lanenet_single_gpu_trainner:train:369 - => Epoch: 1 Time: 2021-09-22 18:57:48 Train loss: 5.92847 Train miou: 0.41158 ...
train loss: 3.82123, b_loss: 0.43539, i_loss: 1.58457, miou: 0.49910: 76% 291/385 [01:06<00:21, 4.35it/s]
```