This is regarding the test to select the country with the highest percentage of people with a salary of >50K
.
I thought the easiest way to do this would be to groupby
native-country
, then select the salary
column, and do value_counts
to get normalized percentages. This way I can just select the row with the highest >50K
value.
country_percentages = df.groupby('native-country')['salary'].value_counts(normalize=True)
highest_earning_country = country_percentages.idxmax()
highest_earning_country_percentage = (country_percentages.max() * 100).round(1)
However, the result of the first line is a Series
with Tuples
for keys.
native-country salary
? <=50K 0.749571
>50K 0.250429
Cambodia <=50K 0.631579
>50K 0.368421
Canada <=50K 0.677686
...
United-States >50K 0.245835
Vietnam <=50K 0.925373
>50K 0.074627
Yugoslavia <=50K 0.625000
>50K 0.375000
Meanwhile, I have used an alternate approach which has allowed me to complete the challenge, but I’d like to try to understand if I was on the right lines with my initial solution and how I might’ve been able to fix it.
Is anyone familiar enough with Pandas to know if my initial approach was viable and how I could’ve completed it this way?
Your browser information:
User Agent is: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0
.
Challenge: Demographic Data Analyzer
Link to the challenge: