Hi,
I am referring to this code (https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb for classification) and running it on Azure Databricks Runtime 7.2 ML (includes Apache Spark 3.0.0, GPU, Scala 2.12). I was able to train a model. Although for predictions, I am using a 4 GPU cluster but it is still taking very long time. I suspect that my cluster is not fully utilized and infact still being used as CPU only...Is there anything I need to change to ensure that the GPUs cluster is being utilized and able to function in distributed manner.
I also referred to Databricks documentation (https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/train-model/tensorflow) and did install gpu enabled tensorflow mentioned as:
%pip install https://databricks-prod-cloudfront.cloud.databricks.com/artifacts/tensorflow/runtime-7.x/tensorflow-1.15.3-cp37-cp37m-linux_x86_64.whl
But even after that print([tf.version, tf.test.is_gpu_available()]) still shows FALSE as value and no improvement in my cluster utilization
Can anyone help on how can i enable full cluster utilization (to worker nodes) for my prediction through fine-tuned bert model?
I would really appreciate the help.