Demographic Data

Hello everyone,
I think i got the code right buy for some reason all the numeric results fail the testing.

import pandas as pd

def calculate_demographic_data(print_data=True):
# Read data from file
df = pd.read_csv(‘’, header=0)

# How many of each race are represented in this dataset? This should be a Pandas series with race names as the index labels.
race_count = df['race'].value_counts()

# What is the average age of men?
average_age_men = df.loc[df['sex'].str.contains('Male'), 'age'].mean()

# What is the percentage of people who have a Bachelor's degree?
percentage_bachelors = df.loc[df['education'].str.contains('Bachelors') | df['education'].str.contains('Masters') | df['education'].str.contains('Doctorate') , 'education'].count()/df.count()[0]*100
# What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
# What percentage of people without advanced education make more than 50K?

# with and without `Bachelors`, `Masters`, or `Doctorate`
higher_education = df.loc[df['education'].str.contains('Bachelors') | df['education'].str.contains('Masters') | df['education'].str.contains('Doctorate') , ['education', 'salary']].loc[df['salary']=='<=50K' , 'education'].count()
lower_education = df.loc[df['salary']=='<=50K','education'].count()-df.loc[df['education'].str.contains('Bachelors') | df['education'].str.contains('Masters') | df['education'].str.contains('Doctorate') , ['education', 'salary']].loc[df['salary']=='<=50K' , 'education'].count()

# percentage with salary >50K
higher_education_rich = higher_education/(higher_education+lower_education)*100
lower_education_rich = lower_education/(higher_education+lower_education)*100

# What is the minimum number of hours a person works per week (hours-per-week feature)?
min_work_hours = df['hours-per-week'].min()

# What percentage of the people who work the minimum number of hours per week have a salary of >50K?
num_min_workers = df.loc[(df['hours-per-week']==df['hours-per-week'].min()) & (df['salary']=='<=50K'), 'age'].count()

rich_percentage = num_min_workers/(num_min_workers+min_work_hours)*100

# What country has the highest percentage of people that earn >50K?
highest_earning_country = df.loc[df['salary']=='<=50K','native-country'].value_counts().index.tolist()[0]
highest_earning_country_percentage = df.loc[df['salary']=='<=50K','native-country'].value_counts()[0]/df.loc[df['salary']=='<=50K','native-country'].count()*100

# Identify the most popular occupation for those who earn >50K in India.
top_IN_occupation = df.loc[(df['salary']=='<=50K') & (df['native-country']=='India'),'occupation'].value_counts().index.tolist()[0]


if print_data:
    print("Number of each race:\n", race_count) 
    print("Average age of men:", average_age_men)
    print(f"Percentage with Bachelors degrees: {percentage_bachelors}%")
    print(f"Percentage with higher education that earn >50K: {higher_education_rich}%")
    print(f"Percentage without higher education that earn >50K: {lower_education_rich}%")
    print(f"Min work time: {min_work_hours} hours/week")
    print(f"Percentage of rich among those who work fewest hours: {rich_percentage}%")
    print("Country with highest percentage of rich:", highest_earning_country)
    print(f"Highest percentage of rich people in country: {highest_earning_country_percentage}%")
    print("Top occupations in India:", top_IN_occupation)

return {
    'race_count': race_count,
    'average_age_men': average_age_men,
    'percentage_bachelors': percentage_bachelors,
    'higher_education_rich': higher_education_rich,
    'lower_education_rich': lower_education_rich,
    'min_work_hours': min_work_hours,
    'rich_percentage': rich_percentage,
    'highest_earning_country': highest_earning_country,
    'top_IN_occupation': top_IN_occupation

It returns:


The problem seems to rely on the decimal evaluation.

So you need to look at the errors that are displayed by the test:

Average age of men: 39.43354749885268
ERROR: test_average_age_men (test_module.DemographicAnalyzerTestCase)
Traceback (most recent call last):
  File "/home/gray/src/work/fcc-da-demographics/", line 24, in test_average_age_men
  File "/usr/lib/python3.9/unittest/", line 876, in assertAlmostEqual
    if round(diff, places) == 0:
TypeError: an integer is required (got type str)

This is fairly straightforward. The test wants an integer (actually one of the functions in the test; the test really wants a number, but whatever), and it got a string. You can see this if you look at the test’s code:

    def test_average_age_men(self):
        actual =["average_age_men"]
        expected = 39.4
            actual, expected, "Expected different value for average age of men."

So the test really expects the number 39.4 in average_age_men, and not a string or a long decimal. So you need to change your calculation of average_age_men so that it returns 39.4.

The other tests are similarly diagnosed. Good luck.

Yes, I know that. Don’t really know how to change that.

Have you tried rounding the numbers?

to start finding out how to do something you need to start googling. Knowing what to Google is half of the battle. For this, I see that the test is expecting a str and is getting an int. I googled: convert str to int in df
I found a bit of code

df['DataFrame Column'] = df['DataFrame Column'].astype(int)

you would need to change the DataFrameColumns to what the column name is and it should convert every row of the df (dataframe) to an integer.

sorry Jeremy meant to reply to the original post.

Yes, thank you, I’m new to the python (And coding outside matlab) world and getting used to it, to google you have to know what you are looking for jaja

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.