Highest earning country and respective percentage

Tell us what’s happening:
Describe your issue in detail here.
I’m failing only on 2 related cases, which is to find the highest-earning country and the respective percentage. So, even when I have downloaded the csv file, put the filter for the salary >50K, I get in total 7841 items. When I do additional filter by country and put United-States now I get 7171. So, this I also get as a result of my code, however, the right answer is Iran. How is that possible?

Your code so far

dataset_richest_by_country = ((df[df['salary']=='>50K']).groupby('native-country').count()['age']).sort_values(ascending=False)
highest_earning_country = dataset_richest_by_country.head(1).index[0]
total_rich =dataset_richest_by_country.sum()
highest_earning_country_percentage =round(dataset_richest_by_country.head(1)[0]/total_rich*100,1)

My boilerplate url

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36

Challenge: Demographic Data Analyzer

Link to the challenge:

1 Like

I hope this will help you

It is great that you solved the challenge, but instead of posting your full working solution, it is best to stay focused on answering the original poster’s question(s) and help guide them with hints and suggestions to solve their own issues with the challenge.

We are trying to cut back on the number of spoiler solutions found on the forum and instead focus on helping other campers with their questions and definitely not posting full working solutions.


It’s a percentage. There are more US entries, but as a percentage of the total entries, which country has the highest percentage of high salaries (>50K) of the total entries? You are correctly generating the number of >50K salaries per country in dataset_richest_country), but you need the percentage and in highest_earning_country you are just finding the largest number of earners. Or, to quote the specs:

  • What country has the highest percentage of people that earn >50K and what is that percentage?

But I also have there the total_rich parameter. This is the sum of all people, who earn >50K and at the end I divide highest count (US) to the total_rich and multiply by 100 to get the percentage. Shouldn’t it be the right way?

dataset_richest_by_country doesn’t contain percentages. It’s dataset with the quantity of high earners in each country. It needs to be a percentage before you find the maximum. You’re trying to determine both the maximum percentage and which country has the maximum percentage, but you are setting highest_earning_country before you calculate the percentages for highest_earning_country_percentage.

1 Like

Thanks for the help, I got it. Although you get the richest people by country, you have to still divide it by the sum of all people by country.

Ok @JeremyLT from next time i will not post full solution. Thanks for guiding me :slightly_smiling_face: :slightly_smiling_face:

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.