IndentationError: unexpected indent - Anaconda

The Code Below:

 class_names = set(feat_df.loc[:,'label'])
    # Binarize the labels
    # print(class_names)
#    lb = label_binarize(y = y, classes = list(class_names))
    # classes.remove('unknown')
    # lb.fit(y) #for LabelBinarizer not lable_binerize()
    # lb.classes_ #for LabelBinarizer not lable_binerize

    # Split the training data for cross validation
    (X_train, X_test), (y_train, y_test) = train_test_split(X, y, test_size=0.2, 
                                                        random_state=0)
   
    df_y_train = pd.DataFrame(y_train, columns=['label']) #,'Date','group_idx'])
    
    print('df_y_train.shape', df_y_train.shape,'X_train', X_train.shape)
    ##### Dimensionality Reduction ####
Error Message::
File "<ipython-input-50-1c94ab12f530>", line 10
    (X_train, X_test), (y_train, y_test) = train_test_split(X, y, test_size=0.2,
    ^
IndentationError: unexpected indent

Hello fngwira.

I have edited your post for readability. In the future, use Markdown to format your posts, by placing any code in between backticks (`).
Markdown_Forums

To answer your question:
Remove the parentheses around your split output variables.
X_train, X_test, y_train, y_test = ...

Hope this helps

@Sky020 the error message is still there:

File “”, line 10
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
^
IndentationError: unexpected indent

Do you have that line inside of any functions, statements, or class definitions?

The error is just saying that it is not expecting the line to be indented (white space) that much.

This is my Code:::

def ML_with_CV_feat(cv_feat_file='../data/cv_feat.csv', n_comp=100, 
                    plotting=False):
            
    # Importing the bottleneck features for each image
    feat_df = pd.read_csv(cv_feat_file, index_col=0, dtype='unicode')
    ##-- Dealing with NaN
    feat_df.fillna(0, inplace=True)  
    feat_df['blob_detected'] = feat_df['blob_detected']*1
    #['cell_area', 'cell_eccentricity', 'cell_solidity', 'average_blue', 'average_green', 'average_red', 'blob_detected', 'num_of_blobs', 'average_blob_area']
#    feat_df = feat_df.sample(frac=0.01)
    feat_df.drop(columns=['cell_area', 'cell_eccentricity', 'cell_solidity',
                           'average_blue', 'average_green', 'average_red'],
                 inplace=True)
    #Removing features that do not seperate populations of cell class
    
    column_names = feat_names = list(feat_df.columns)
    print(column_names)
    for X in ['label','fn']:
        feat_names.remove(x)
#    feat_df = feat_df.iloc[0:300,:]
    mask = feat_df.loc[:, 'label'].isin(['Infected', 'Uninfected'])
    feat_df = feat_df.loc[mask, :].drop_duplicates()
    
    print('Number of features:', len(feat_names))
    y = feat_df.loc[:,['label']].values
    print(type(y), y.shape)

    print('Number of samples for each label \n', feat_df.groupby('label')['label'].count())
#    print(feat_df.head())
    X = feat_df.loc[:, feat_names].astype(float).values
    print('/nColumn feat names after placing into X',
          list(feat_df.loc[:, feat_names].columns))
class_names = set(feat_df.loc[:,'label'])
    # Binarize the labels
    # print(class_names)
#    lb = label_binarize(y = y, classes = list(class_names))
    # classes.remove('unknown')
    # lb.fit(y) #for LabelBinarizer not lable_binerize()
    # lb.classes_ #for LabelBinarizer not lable_binerize

    # Split the training data for cross validation
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
                                                        random_state=0)
   
    df_y_train = pd.DataFrame(y_train, columns=['label']) #,'Date','group_idx'])
    
    print('df_y_train.shape', df_y_train.shape,'X_train', X_train.shape)
    ##### Dimensionality Reduction ####

Error Message:: File “”, line 10
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
^
IndentationError: unexpected indent

If you want this inside the function ML_with_CV_feat():
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Then, add however many spaces (indents) you need to this:
class_names = set(feat_df.loc[:,'label'])
So that it is the same level as all the other code inside the function.

If you do not want the split testing data to be defined inside the function, then make it the same indentation as the class_names variable.

In Python, the indentation of your code defines what section goes with another.

Hope this helps

Not much if possible, can you offer further editing support…mmmm!!

Use this:

def ML_with_CV_feat(cv_feat_file='../data/cv_feat.csv', n_comp=100, plotting=False):
            
    feat_df = pd.read_csv(cv_feat_file, index_col=0, dtype='unicode')
    feat_df.fillna(0, inplace=True)  
    feat_df['blob_detected'] = feat_df['blob_detected']*1
    #['cell_area', 'cell_eccentricity', 'cell_solidity', 'average_blue', 'average_green', 'average_red', 'blob_detected', 'num_of_blobs', 'average_blob_area']
    #feat_df = feat_df.sample(frac=0.01)
    feat_df.drop(columns=['cell_area', 'cell_eccentricity', 'cell_solidity', 'average_blue', 'average_green', 'average_red'], inplace=True)
    
    column_names = feat_names = list(feat_df.columns)
    print(column_names)

    for X in ['label','fn']: #! THIS DOES NOT MAKE SENSE
        feat_names.remove(x) #CHOOSE TO USE 'X' OR 'x'...WHAT IS 'x'?

    #feat_df = feat_df.iloc[0:300,:]
    mask = feat_df.loc[:, 'label'].isin(['Infected', 'Uninfected'])
    feat_df = feat_df.loc[mask, :].drop_duplicates()
    
    print('Number of features:', len(feat_names))
    y = feat_df.loc[:,['label']].values
    print(type(y), y.shape)

    print('Number of samples for each label \n', feat_df.groupby('label')['label'].count())

    X = feat_df.loc[:, feat_names].astype(float).values
    print('/nColumn feat names after placing into X', list(feat_df.loc[:, feat_names].columns))
    class_names = set(feat_df.loc[:,'label'])

    # print(class_names)
    #lb = label_binarize(y = y, classes = list(class_names))
    # classes.remove('unknown')
    # lb.fit(y) #for LabelBinarizer not lable_binerize()
    # lb.classes_ #for LabelBinarizer not lable_binerize

    # Split the training data for cross validation
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
   
    df_y_train = pd.DataFrame(y_train, columns=['label']) #,'Date','group_idx'])
    
    print('df_y_train.shape', df_y_train.shape,'X_train', X_train.shape)
    ##### Dimensionality Reduction ####

Try that. Look out for my comments that I added in CAPITAL LETTERS

As the error message indicates, you have an indentation error . This error occurs when a statement is unnecessarily indented or its indentation does not match the indentation of former statements in the same block. Python not only insists on indentation, it insists on consistent indentation . You are free to choose the number of spaces of indentation to use, but you then need to stick with it. If you indent one line by 4 spaces, but then indent the next by 2 (or 5, or 10, or …), you’ll get this error.

However, by default, mixing tabs and spaces is still allowed in Python 2 , but it is highly recommended not to use this “feature”. Python 3 disallows mixing the use of tabs and spaces for indentation. Replacing tabs with 4 spaces is the recommended approach for writing Python code .

Hi @fillermark!

This post has not been active for over a year.

Please only reply to newer topics.

Thanks!