Requesting help for a faster alternative of pd.groupby

I am new to python. I have two panda dataframes, they need to be merged conditionally.

group= df.groupby(['index1', 'index2'])['counts'].mean()

for index, value in group.items():
    final_df.loc[(final_df['index1'] == index[0]) & 
           (final_df['index2'] == index[1]), 'result'] = value

For small data sets, above code works just fine, but for a very large dataset, the for loop would take many many hours to complete. Is there any alternative for doing the above task much faster?

Can you explain what you are doing?

Also try looking at the documentation of pandas. I think there is a merge command right away.

1 Like

@zollen Welcome to the forum!

Look into a library called “Modin”. That might be what you are looking for to parallel process the data frames in all CPU cores. The more cores you have the better.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.