I have some confusions about the Neural Network SMS Text Classifier project.
In the introduction text stays the following: ** For this challenge, you will use the SMS Spam Collection dataset. The dataset has already been grouped into train data and test data.**
But there are no test datas - only validation and training. Since test datas are important for a training model - should we create then additionally test datas? And if yes how many? I think the explanation has a not so high accuracy at this points…
Not very familiarized with the context, but if test data is the only problem you can use Test Split. In Scikit learn is known as
train_test_split
In Keras there is
validation_split
you make the split when you are fitting the model, and you choose the proportion of testing , for example; validation_split=0.15
Even with categorical data, you need to make a validation part.
Hope this helps
You get two files of data in tab separated value (tsv) files: a training set and a validation set. Use those for training and validating your model. I assume the author intended to say “validation” instead of “test” when writing about the provided data.
In the function definition for the test (test_predictions()) there is an array of ham and spam test messages (test_messages) that will be processed by your trained model and compared against the expected correct results (test_answers).
So, all the data is there and there is no need to create any.