Pandas, easier way of using groupby()?

So I got a CSV file to the likes of

Now, I got asked :
"Search for the percentage of Blue eyes for each Country"
So what I did first was grouping by Country just to get a list (or series perhaps?) of the counts of each Country:
import pandas as pd
df = read_csv(‘people.csv’)
group1 = df.groupby(‘country’)
counter1 = group1[‘country’].count()

Then it comes the messy part, where I get a counter just of the Blue eyes to then join both counters into a new data frame, something like this:

only_blueeyes = df.loc[df[‘blue-eyes’] == ‘yes’]
group2 = only_blueeyes.groupby(‘country’)
counter2 = group2[‘country’].count()
result_frame = pd.DataFrame({‘Total’: counter1 , ‘Only Blue Eyes’: counter2})
result_frame[‘Percentage of Blue Eyes’] =( result_frame[‘Only Blue Eyes’] / result_frame[‘Total’]) * 100

It actually does its job, but my question here is if there’s anyway of simplifying the whole operation and make it a little bit cleaner?

Edit: Corrected some variable names so they would match

1 Like

The trick to getting percentages is to convert the interested column into boolean. Here is an attempt with small data set.

import pandas as pd

# test data from your example
data = {
    'age': [23, 87, 11, 10],
    'sex': ['male', 'female','male', 'female'],
    'country': ['US','Germany','Japan', 'US'],

# Initialize the dataframe
df = pd.DataFrame(data)

# add the boolean column to the data frame instead of the text field.
df['blue-eyes-flag'] = df['blue-eyes'] == 'yes'

# the mean will sum the boolean values simulating a counting blue eyes = true
# the mean will also count all observations irrespective of the value being true or false. 
# This allows you to calculate percentage in a simpler step.
result_series = df.groupby('country').mean()['blue-eyes-flag']*100


Let me know if this makes sense.


thanks a lot kind stranger .i was a bit stuck with my project and now i finally understand and on my way to finish my project