Completing Demographic Data Analyzer with groupby instead of division

sethi · March 10, 2021, 2:54am

This is regarding the test to select the country with the highest percentage of people with a salary of >50K.

I thought the easiest way to do this would be to groupby native-country, then select the salary column, and do value_counts to get normalized percentages. This way I can just select the row with the highest >50K value.

country_percentages = df.groupby('native-country')['salary'].value_counts(normalize=True)
highest_earning_country = country_percentages.idxmax()
highest_earning_country_percentage = (country_percentages.max() * 100).round(1)

However, the result of the first line is a Series with Tuples for keys.

native-country  salary
?               <=50K     0.749571
                >50K      0.250429
Cambodia        <=50K     0.631579
                >50K      0.368421
Canada          <=50K     0.677686
                            ...   
United-States   >50K      0.245835
Vietnam         <=50K     0.925373
                >50K      0.074627
Yugoslavia      <=50K     0.625000
                >50K      0.375000

Meanwhile, I have used an alternate approach which has allowed me to complete the challenge, but I’d like to try to understand if I was on the right lines with my initial solution and how I might’ve been able to fix it.

Is anyone familiar enough with Pandas to know if my initial approach was viable and how I could’ve completed it this way?

Your browser information:

User Agent is: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0.

Challenge: Demographic Data Analyzer

Link to the challenge:

system · September 8, 2021, 2:55pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.