Machine Learning w/ Python: Project 2 (Dog, Cat images) - probabilities list question


I am trying to complete project 2 of the Machine Learning with Python certification, and am stuck on trying to figure out how I can get “probabilities” for test data?

In the code below, what is the correct way of calling plotImages(), with the probabilities list?

def plotImages(images_arr, probabilities = False):

    fig, axes = plt.subplots(len(images_arr), 1, figsize=(5,len(images_arr) * 3))

    if probabilities is False:

      for img, ax in zip( images_arr, axes):




      for img, probability, ax in zip( images_arr, probabilities, axes):



          if probability > 0.5:

              ax.set_title("%.2f" % (probability*100) + "% dog")


              ax.set_title("%.2f" % ((1-probability)*100) + "% cat")

sample_training_images, _ = next(train_data_gen)


Link to my colab workbook:


At the moment, your accuracy is 50% on both training and validation. Change your final layer to have two nodes, rather than just one.


You can get the probabilities variable for the final test function like so

predicted_class = model.predict_classes(test_data_gen)
probabilities = predicted_class.tolist()

This is sufficient for you to pass the test function at the end. If you want the actual probability though, you can use something like this

# gives you the probability of cat in one column and dog in the other
result = model.predict(test_data_gen) 
# extract the probability corresponding to the predicted class
actual_probabilities = [np.max(vector) for vector in result]
1 Like

Awesome, thank you for going through my post! I was able to use your suggestion and plot compare the actual vs. expected.

In addition to the add(Dense(2)), I made a couple of other changes to the model, and was able to get accuracy over 70% (around 74%+).

As you were working through this exercise, how did you develop intuition for number of filters and kernel_size (for example in Conv2D()). I found myself trying one combination, then another, until I had a model that was accurate enough. Perhaps I need to read the documentation once more to develop better understanding.

Link to my updated notebook:

I don’t know if there is a definitive answer (or know much about this topic – I should read up on this too). I just played around with the number of neurons and the number of layers.

You want to have enough neurons to describe your problem but not too many for the model to overspecialise (as you want it to describe a general picture). So you probably want the minimum number that describes the complexity of the system with maximal accuracy. Beyond trial and error, I don’t know what the most sensible values should be.

Yes, I agree with what you said. I definitely need to read up some more and practice some more problems to get better.

Thank you for being active on the forums, I have found your responses in other posts in the machine learning section very helpful!


I just came across this question a bit late but after checking your code I would like to point out some things since although you did a great job preprocessing the dataset and somehow managed to pass the test with a high accuracy, the algorithm you used is misleading:

  • The architecture used is a very good implementation of conv and pooling layers but you can improve it with an initialization like ‘He_uniform’ which has recently shown to be very helpful for the improvement of conv2d layers.

  • Since you first ran into some overfitting issues (when you got 50% accuracy) it is clear the need for Dropout regularisation which should be included right after some or all of the conv2d layers. I chose to do it after the first dense layer for simplicity with a usual value of 50% - Dropout(0.5) .

  • for this particular binary classification task the output should be in the form of probabilities Pdog and Pcat = 1 - Pdog. For that reason you are required to use the binary cross-entropy loss with logits set to false unless you don’t give an activation to the last dense output layer. This layer should only contain one unit and sigmoid activation to match the probability interpretation of the plotting function.

  • as a final remark I would always recommend ‘Adam’ as optimizer specially in computer vision tasks since it is to date one of the most reliable methods out there.