[ML] Health Costs Calculator - Errors at evaluate - test_labels - Splitting the test_data

Tell us what’s happening:
In the final cell I get an errors saying

TypeError: Could not build a TypeSpec for       age     sex   bmi  children smoker     region


ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).
Your code so far
In my notebook at first I split the dataset into train_dataset and test_dataset. Second I .pop(‘expenses’) only from the test_dataset. Then I transform the train_dataset(which is a pandas df into a TensorFlow Dataset with this function:
def dataframe_to_dataset(dataframe, shuffle=False):
  dataframe = dataframe.copy()
  labels = dataframe.pop('expenses')
  tf_ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
  if shuffle == True:
    tf_ds = tf_ds.shuffle(buffer_size=len(dataframe))

  return tf_ds

train_dataset = dataframe_to_dataset(train_dataset, True)

In order to transform/convert the data (categorical into numerical and normalizing numericals) I used the StringLookup and IntegerLookup and Normalization respectively from TensorFlow, below:

def encode_categorical(feature, name, dataset, is_string):
  lookup_class = StringLookup if is_string else IntegerLookup

  lookup = lookup_class(output_mode="binary")
  feature_ds = dataset.map(lambda x, y: x[name]) # ?
  feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

  '''The vocabulary for the layer can be supplied on construction or learned 
  via adapt(). During adapt(), the layer will analyze a data set, determine the frequency of 
  individual integer tokens, and create a vocabulary from them.'''
  encoded_feature = lookup(feature)

  return encoded_feature

def numerical_normalizer(feature, name, dataset):
  normalizer = Normalization()

  feature_ds = dataset.map(lambda x, y: x[name])
  feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

  encoded_feature = normalizer(feature)

  return encoded_feature

I then create InputLayers for all the encoded/transformed features. Build the model compile it and fit it with:

model.fit(train_dataset, epochs=50)

Also I am way of the goal of mae < 3500, however, when I want to run the final cell I get the error messages I mentioned above. It must have something to do with the split above or .pop(). At first I also tried to transform the test_dataset into a TF-Dataset but then I “loose” the test_labels since it will be inside the whole TF-Dataset like training. As you can see when calling fit I only have train_dataset as input and not train_labels?

Linear Regression Health Costs Calculator

Link to the challenge:

I used this example as an reference: Structured data classification from scratch (keras.io)

How could be a transformation of the test_dataset be possible somehow in order to make it work as it is mentioned in the error?

This is the problem. You splitting and popping are fine. I used much the same logic as you, but I treated the training and test datasets identically (including any batching and prefetching) and then used the test dataset for model.evaluate(). These datasets contain the labels, so the training, validation, and evaluation functions should all expect to find those labels there as needed.

You processed your test dataset a little differently (you didn’t run it through df_to_ds()) and you didn’t batch it like you did the training data. Whatever data you feed into a model after compilation and training has to have the same format as the training data.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.