I am new to python. I have two panda dataframes, they need to be merged conditionally.
group= df.groupby(['index1', 'index2'])['counts'].mean()
for index, value in group.items():
final_df.loc[(final_df['index1'] == index[0]) &
(final_df['index2'] == index[1]), 'result'] = value
For small data sets, above code works just fine, but for a very large dataset, the for loop would take many many hours to complete. Is there any alternative for doing the above task much faster?
Look into a library called “Modin”. That might be what you are looking for to parallel process the data frames in all CPU cores. The more cores you have the better.