Errors with Demographic Data Analyzer

Tell us what’s happening:
I’m failing 3 of the tests:
1: highest earning country should be Iran, it’s coming up as the US for me, maybe I used incorrect parameters?
1b: the percentage is therefore incorrect, but somehow still low for me? (22% instead of almost 42%)
2: I’m getting that the most popular occupation, >50K or otherwise, in India is Private, and not Prof-specialty.

Your code so far
import pandas as pd

def calculate_demographic_data(print_data=True):
# Read data from file
df = pd.read_csv(‘’)

# How many of each race are represented in this dataset? This should be a Pandas series with race names as the index labels.
race_count = df.value_counts(subset=['race'])

# What is the average age of men?
age = df.loc[df['sex']=='Male', 'age'] #collect the ages of all males in df
average_age_men = round(age.mean(),1) #take the average of the ages

# What is the percentage of people who have a Bachelor's degree?
bachelors = len(df.loc[df['education']=='Bachelors']) #determine how many in df have a bachelor's degree
percentage_bachelors = round((bachelors/len(df) * 100),1) #calculate a percentage compared to the total population of df

# What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
# What percentage of people without advanced education make more than 50K?

# with and without `Bachelors`, `Masters`, or `Doctorate`
higher_education = df[(df['education']=='Bachelors')|(df['education']=='Masters')|(df['education']=='Doctorate')]
num_h_e_r = higher_education[higher_education['salary']=='>50K']
lower_education = df[(df['education']!='Bachelors')&(df['education']!='Masters')&(df['education']!='Doctorate')]
num_l_e_r = lower_education[lower_education['salary']=='>50K']
# percentage with salary >50K
higher_education_rich = round(len(num_h_e_r)/len(higher_education)*100,1)
lower_education_rich = round(len(num_l_e_r)/len(lower_education)*100,1)

# What is the minimum number of hours a person works per week (hours-per-week feature)?
min_work_hours = work_hours.min()

# What percentage of the people who work the minimum number of hours per week have a salary of >50K?
num_min_workers = df[df['hours-per-week']==min_work_hours]
rich_min_workers = df[(df['hours-per-week']==min_work_hours) & (df['salary']=='>50K')]

rich_percentage = round(len(rich_min_workers)/len(num_min_workers) *100, 1)

# What country has the highest percentage of people that earn >50K?
rich = df.loc[df['salary']=='>50K','native-country']
ordered = rich.value_counts()

highest_earning_country = ordered.index[0]
highest_earning_country_percentage = round(ordered[0]/len(df) *100,1)

# Identify the most popular occupation for those who earn >50K in India.
IN = df.loc[df['native-country']=='India',['workclass','salary']]
occupations = IN.loc[IN['salary']=='>50K','workclass']
ordered_occupations = occupations.value_counts()
top_IN_occupation = ordered_occupations.index[0]


if print_data:
    print("Number of each race:\n", race_count) 
    print("Average age of men:", average_age_men)
    print(f"Percentage with Bachelors degrees: {percentage_bachelors}%")
    print(f"Percentage with higher education that earn >50K: {higher_education_rich}%")
    print(f"Percentage without higher education that earn >50K: {lower_education_rich}%")
    print(f"Min work time: {min_work_hours} hours/week")
    print(f"Percentage of rich among those who work fewest hours: {rich_percentage}%")
    print("Country with highest percentage of rich:", highest_earning_country)
    print(f"Highest percentage of rich people in country: {highest_earning_country_percentage}%")
    print("Top occupation in India:", top_IN_occupation)

return {
    'race_count': race_count,
    'average_age_men': average_age_men,
    'percentage_bachelors': percentage_bachelors,
    'higher_education_rich': higher_education_rich,
    'lower_education_rich': lower_education_rich,
    'min_work_hours': min_work_hours,
    'rich_percentage': rich_percentage,
    'highest_earning_country': highest_earning_country,
    'top_IN_occupation': top_IN_occupation

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36

Challenge: Demographic Data Analyzer

Link to the challenge:

That’s because you start with the country with the most people earning over 50k. That’s like having a hat with 1red ball and one with 2red+1blue balls.
The highest percentage of red is 100% in the first hat. But you order it by the total amount of red balls, which is 2, but only make up 67% in it’s respective hat.
On top of that you divide by the length of df, which is wrong.

As for the top indian occupation… first I’d do is print some values inbetween the tasks to see what my commands produce.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.