Machine Learning and Categorical Data Conversion

I have been struggling with predicting health costs challenge. I have looked at the tutorial on tensorflow on fcc, as well as Kylie Ying’s tutorial. I’ve tried applying both formats for preparing the datasets and I get errors I don’t understand for both. The fcc version uses tf.feature_column, which seems to be deprecated and Kylie Ying’s method throws a few errors:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/util/structure.py in normalize_element(element, element_signature)
    101         if spec is None:
--> 102           spec = type_spec_from_value(t, use_fallback=False)
    103       except TypeError:

12 frames
TypeError: Could not build a `TypeSpec` for       age     sex   bmi  children smoker     region
637    35  female  38.1         2     no  northeast
1310   42    male  26.3         1     no  northwest
1076   47  female  32.0         1     no  southwest
998    33  female  36.3         3     no  northeast
45     55    male  37.3         0     no  southwest
...   ...     ...   ...       ...    ...        ...
100    41  female  31.6         0     no  southwest
470    27    male  32.7         0     no  southeast
1328   23  female  24.2         2     no  northeast
167    32  female  33.2         3     no  northwest
1054   27  female  21.5         0     no  northwest

[1070 rows x 6 columns] with type DataFrame

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
    100       dtype = dtypes.as_dtype(dtype).as_datatype_enum
    101   ctx.ensure_initialized()
--> 102   return ops.EagerTensor(value, ctx.device_name, dtype)
    103 
    104 

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

Any body point me in the right direction to find some help here?

You need to encode the string values (male/female, northeast/northwest etc) of the two categorical features (sex and region) into numerical values (0 or 1) before feeding them into the model.

To do this by Tensorflow’s methods, you may consult this tutorial:

Or you may do it in Pandas by map/apply/get_dummies functions.

1 Like