Help! Unable to One Hot Encode last Column

I am trying to OneHotEncode the last column of my Excel table which has categorical data with three categories. All the other columns have numeric data except the last. This is my code:

import numpy as np
import pandas as pd
import tensorflow as tf

# Part 1 - Data Preprocessing

# Importing the dataset
dataset = pd.read_csv('ANN_1_APP.csv')
X = dataset.iloc[:, 0:-1].values
y = dataset.iloc[:, -1].values

#One Hot Encoding the "Geography" column
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [-1])], remainder='passthrough')
y = np.array(ct.fit_transform(y))

The last print(y) gives the values in the table column e.g. Home, Away, Draw instead of encoding them into binary representation.

Kindly help . . .!

I’ve edited your post for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.

You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.

See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (’).

You use a lot of transformations that seem unecessary.
Like why use .values? Why add the np.array?
For only one column you don’t need a ColumnTransformer.
So first test would be, if just using the OHEncoder works out. Only then bundle it into the ColumnTransformer (which would only be useful if you had more than one transformer to begin with).

In case the problem is still relevant: I think you canno use the “ColumnTransformer” for OneHotEncoding because the CT expects to return the same number of columns as it gets. However OHE creates one columns per unique entry.

I remembered struggling with a FeaturePreprocessingPipeline with that…
The go-to class should be “FeatureUnion” or “Pipeline” instead of “ColumnTransformer”.

(post deleted by author)

Hi thank you so much for your comments. However for some reason I was unable to use OneHotEncode.

The one that worked for me is the get_dummies as shown below:

y = pd.get_dummies(dataset[:,-1])

Thank you so much.
Can you kindly type some sample code . . . maybe I can get my head around that because I am new to Python and ML

I just looked at my old pipeline and turns out I used ColumnTransformer. BUT I used OneHotEncoder(sparse=True) so maybe that’s why it didn’t work?

Anyway for that sample code… I can give you my new pipeline - it’s using something called DataFrameMapper, which allows Pandas and Sklearn to work together better, by replacing the ColumnTransformer and FeatureUnion in a way that does return DataFrames and thus keeps column-names.

That said, it’s quite a complex thing (and technically only doing basic feature preprocessing), but if you are interested, here is the Notebook: