Ok, still need to work on it, but this seems to be the solution:
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
from IPython.display import clear_output
Load dataset.
dftrain = pd.read_csv(‘link’) # training data
dfeval = pd.read_csv(‘link’) # testing data
Separate labels before encoding
y_train = dftrain.pop(‘survived’)
y_eval = dfeval.pop(‘survived’)
Combine train and eval to ensure same one-hot encoding columns
combined = pd.concat([dftrain, dfeval])
combined_encoded = pd.get_dummies(combined)
Split back into train and eval
X_train = combined_encoded.iloc[:len(dftrain)]
X_eval = combined_encoded.iloc[len(dftrain):]
Ensure both train and eval have the same columns after get_dummies
This step is crucial in case some categories are only in train or eval
train_cols = set(X_train.columns)
eval_cols = set(X_eval.columns)
missing_in_eval = list(train_cols - eval_cols)
for c in missing_in_eval:
X_eval[c] = 0
missing_in_train = list(eval_cols - train_cols)
for c in missing_in_train:
X_train[c] = 0
X_eval = X_eval[X_train.columns] # Ensure columns are in the same order
Define the model
Update the input shape to match the number of columns after one-hot encoding
model = keras.Sequential([
layers.Input(shape=(X_train.shape[1],)),
layers.Dense(1, activation=‘sigmoid’) # Use sigmoid for binary classification
])
Compile the model
Use binary_crossentropy for binary classification
model.compile(optimizer=‘adam’, loss=‘binary_crossentropy’, metrics=[‘accuracy’])
Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32) # Reduced epochs for faster execution in example
Evaluate
loss, accuracy = model.evaluate(X_eval, y_eval)
print(f"Accuracy: {accuracy:.4f}")
Get raw probabilities from Keras
probs = model.predict(X_eval).flatten() # shape: (num_samples, 1) → flatten to (num_samples,)
probs = pd.Series(probs)
probs.plot(kind=‘hist’, bins=20, title=‘Predicted Probabilities’)
plt.show() # Add plt.show() to display the plot