Error while creating a predictive model

Hello, I’m trying to build a model that predict the proba to win a medal for an athelete
I have a dataframe that look like this :slight_smile:

Here is what I’ve already done

#Cleaning df
    #Replace NaN with mean or average
    
df['Height'].fillna(value=df['Height'].mean(), inplace=True)
df['Weight'].fillna(value=df['Weight'].mean(), inplace=True)

    #Changing type to integer
df.Height = df.Height.astype(int)
df.Weight = df.Weight.astype(int)

#Target variable
y= df["Medal"]

#If Male =0, if female = 1
df['Sex'] = df['Sex'].apply(lambda x: 1 if str(x) != 'M' else 0)

#Predictive
feature_names = ["Age", "Sex", "Height", "Weight"]
X= df[feature_names]

#Regressor

from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
regressor = DecisionTreeRegressor(random_state=0)
cross_val_score(regressor, X, y, cv=10)

But when I run the code, it returns me an error

warnings.warn("Estimator fit failed. The score on this train-test"
C:\Users\miss_\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:610: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "C:\Users\miss_\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\miss_\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 1247, in fit
    super().fit(
  File "C:\Users\miss_\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 156, in fit
    X, y = self._validate_data(X, y,
  File "C:\Users\miss_\anaconda3\lib\site-packages\sklearn\base.py", line 430, in _validate_data
    X = check_array(X, **check_X_params)
  File "C:\Users\miss_\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "C:\Users\miss_\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 663, in check_array
    _assert_all_finite(array,
  File "C:\Users\miss_\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 103, in _assert_all_finite
    raise ValueError(
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

  warnings.warn("Estimator fit failed. The score on this train-test"

And returns an array like this : array[NaN, NaN, NaN…]

My X looks like this

	Age	Sex	Height	Weight
0	24.0	1	180	80
1	23.0	1	170	60
2	24.0	1	175	70
3	34.0	1	175	70
4	21.0	1	185	82
...	...	...	...	...
271111	29.0	1	179	89
271112	27.0	1	176	59
271113	27.0	1	176	59
271114	30.0	1	185	96
271115	34.0	1	185	96

And my y :

0         0
1         0
2         0
3         1
4         0
         ..
271111    0
271112    0
271113    0
271114    0
271115    0
Name: Medal, Length: 271116, dtype: int64

Thanks in advance

It looks like one of the feature columns might contain some NaN or large values. I would look for any outliers or NaNs in the “Age” and “Sex” features and check the datatypes.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.