Tell us what’s happening:
Describe your issue in detail here.
I’m failing only on 2 related cases, which is to find the highest-earning country and the respective percentage. So, even when I have downloaded the csv file, put the filter for the salary >50K, I get in total 7841 items. When I do additional filter by country and put United-States now I get 7171. So, this I also get as a result of my code, however, the right answer is Iran. How is that possible?
Your code so far
dataset_richest_by_country = ((df[df['salary']=='>50K']).groupby('native-country').count()['age']).sort_values(ascending=False)
highest_earning_country = dataset_richest_by_country.head(1).index
My boilerplate url
Your browser information:
User Agent is:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
Challenge: Demographic Data Analyzer
Link to the challenge:
MOD EDIT: SOLUTION REDACTED
I hope this will help you
It is great that you solved the challenge, but instead of posting your full working solution, it is best to stay focused on answering the original poster’s question(s) and help guide them with hints and suggestions to solve their own issues with the challenge.
We are trying to cut back on the number of spoiler solutions found on the forum and instead focus on helping other campers with their questions and definitely not posting full working solutions.
It’s a percentage. There are more US entries, but as a percentage of the total entries, which country has the highest percentage of high salaries (>50K) of the total entries? You are correctly generating the number of >50K salaries per country in
dataset_richest_country), but you need the percentage and in
highest_earning_country you are just finding the largest number of earners. Or, to quote the specs:
- What country has the highest percentage of people that earn >50K and what is that percentage?
But I also have there the
total_rich parameter. This is the sum of all people, who earn >50K and at the end I divide highest count (US) to the
total_rich and multiply by 100 to get the percentage. Shouldn’t it be the right way?
dataset_richest_by_country doesn’t contain percentages. It’s dataset with the quantity of high earners in each country. It needs to be a percentage before you find the maximum. You’re trying to determine both the maximum percentage and which country has the maximum percentage, but you are setting
highest_earning_country before you calculate the percentages for
Thanks for the help, I got it. Although you get the richest people by country, you have to still divide it by the sum of all people by country.
Ok @JeremyLT from next time i will not post full solution. Thanks for guiding me
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.