I’m failing only on 2 related cases, which is to find the highest-earning country and the respective percentage. So, even when I have downloaded the csv file, put the filter for the salary >50K, I get in total 7841 items. When I do additional filter by country and put United-States now I get 7171. So, this I also get as a result of my code, however, the right answer is Iran. How is that possible?

dataset_richest_by_country = ((df[df['salary']=='>50K']).groupby('native-country').count()['age']).sort_values(ascending=False)
highest_earning_country = dataset_richest_by_country.head(1).index[0]
total_rich =dataset_richest_by_country.sum()
highest_earning_country_percentage =round(dataset_richest_by_country.head(1)[0]/total_rich*100,1)

It’s a percentage. There are more US entries, but as a percentage of the total entries, which country has the highest percentage of high salaries (>50K) of the total entries? You are correctly generating the number of >50K salaries per country in dataset_richest_country), but you need the percentage and in highest_earning_country you are just finding the largest number of earners. Or, to quote the specs:

  • What country has the highest percentage of people that earn >50K and what is that percentage?

But I also have there the total_rich parameter. This is the sum of all people, who earn >50K and at the end I divide highest count (US) to the total_rich and multiply by 100 to get the percentage. Shouldn’t it be the right way?

dataset_richest_by_country doesn’t contain percentages. It’s dataset with the quantity of high earners in each country. It needs to be a percentage before you find the maximum. You’re trying to determine both the maximum percentage and which country has the maximum percentage, but you are setting highest_earning_country before you calculate the percentages for highest_earning_country_percentage.

Thanks for the help, I got it. Although you get the richest people by country, you have to still divide it by the sum of all people by country.

