I watched the videos on RNNs in the Machine Learning course, and I have a technical question. I downloaded the Natural Language Processing jupyter notebook
When I ran the code on google colab on my own, I’ve noticed something strange. For some reason when I’m training the movie review model, done at 6:53 in the video:
in each epoch my notebook seem to only processes 625 data instead of 20,000, like in the video. I attached a screenshot. Since I haven’t changed anything in the code, I don’t understand why it doesn’t train on the whole 80% of the training data (which should be 20,000). When I check the shape of the training data after training, it’s still 25,000, so there is no reason why it should train on 625 only. On the other hand, when evaluating the model, it appears to do that only on 782 instead of 25,000, this is also visible on the picture.
I would be grateful if anyone could please tell me what could be the problem here.
Side note: 25,000 / 782 = 20,000 / 625 = 32. This may be not a coincidence, since 32 is a power of 2. But in any case, I’m puzzled