Tell us what’s happening:
In the final cell I get an errors saying
TypeError: Could not build a TypeSpec for age sex bmi children smoker region and ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int). Your code so far In my notebook at first I split the dataset into train_dataset and test_dataset. Second I .pop(‘expenses’) only from the test_dataset. Then I transform the train_dataset(which is a pandas df into a TensorFlow Dataset with this function:
def dataframe_to_dataset(dataframe, shuffle=False): dataframe = dataframe.copy() labels = dataframe.pop('expenses') tf_ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels)) if shuffle == True: tf_ds = tf_ds.shuffle(buffer_size=len(dataframe)) return tf_ds train_dataset = dataframe_to_dataset(train_dataset, True)
In order to transform/convert the data (categorical into numerical and normalizing numericals) I used the StringLookup and IntegerLookup and Normalization respectively from TensorFlow, below:
def encode_categorical(feature, name, dataset, is_string): lookup_class = StringLookup if is_string else IntegerLookup lookup = lookup_class(output_mode="binary") feature_ds = dataset.map(lambda x, y: x[name]) # ? feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1)) '''The vocabulary for the layer can be supplied on construction or learned via adapt(). During adapt(), the layer will analyze a data set, determine the frequency of individual integer tokens, and create a vocabulary from them.''' lookup.adapt(feature_ds) encoded_feature = lookup(feature) return encoded_feature def numerical_normalizer(feature, name, dataset): normalizer = Normalization() feature_ds = dataset.map(lambda x, y: x[name]) feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1)) normalizer.adapt(feature_ds) encoded_feature = normalizer(feature) return encoded_feature
I then create InputLayers for all the encoded/transformed features. Build the model compile it and fit it with:
Also I am way of the goal of mae < 3500, however, when I want to run the final cell I get the error messages I mentioned above. It must have something to do with the split above or .pop(). At first I also tried to transform the test_dataset into a TF-Dataset but then I “loose” the test_labels since it will be inside the whole TF-Dataset like training. As you can see when calling fit I only have train_dataset as input and not train_labels?
Linear Regression Health Costs Calculator
Link to the challenge:
I used this example as an reference: Structured data classification from scratch (keras.io)
How could be a transformation of the test_dataset be possible somehow in order to make it work as it is mentioned in the error?