Demographic Data

fkrell · April 9, 2021, 8:10pm

Hello everyone,
I think i got the code right buy for some reason all the numeric results fail the testing.

Blockquote
import pandas as pd

def calculate_demographic_data(print_data=True):
# Read data from file
df = pd.read_csv(‘adult.data.csv’, header=0)

# How many of each race are represented in this dataset? This should be a Pandas series with race names as the index labels.
race_count = df['race'].value_counts()

# What is the average age of men?
average_age_men = df.loc[df['sex'].str.contains('Male'), 'age'].mean()

# What is the percentage of people who have a Bachelor's degree?
percentage_bachelors = df.loc[df['education'].str.contains('Bachelors') | df['education'].str.contains('Masters') | df['education'].str.contains('Doctorate') , 'education'].count()/df.count()[0]*100
# What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
# What percentage of people without advanced education make more than 50K?

# with and without `Bachelors`, `Masters`, or `Doctorate`
higher_education = df.loc[df['education'].str.contains('Bachelors') | df['education'].str.contains('Masters') | df['education'].str.contains('Doctorate') , ['education', 'salary']].loc[df['salary']=='<=50K' , 'education'].count()
lower_education = df.loc[df['salary']=='<=50K','education'].count()-df.loc[df['education'].str.contains('Bachelors') | df['education'].str.contains('Masters') | df['education'].str.contains('Doctorate') , ['education', 'salary']].loc[df['salary']=='<=50K' , 'education'].count()

# percentage with salary >50K
higher_education_rich = higher_education/(higher_education+lower_education)*100
lower_education_rich = lower_education/(higher_education+lower_education)*100

# What is the minimum number of hours a person works per week (hours-per-week feature)?
min_work_hours = df['hours-per-week'].min()

# What percentage of the people who work the minimum number of hours per week have a salary of >50K?
num_min_workers = df.loc[(df['hours-per-week']==df['hours-per-week'].min()) & (df['salary']=='<=50K'), 'age'].count()

rich_percentage = num_min_workers/(num_min_workers+min_work_hours)*100

# What country has the highest percentage of people that earn >50K?
highest_earning_country = df.loc[df['salary']=='<=50K','native-country'].value_counts().index.tolist()[0]
highest_earning_country_percentage = df.loc[df['salary']=='<=50K','native-country'].value_counts()[0]/df.loc[df['salary']=='<=50K','native-country'].count()*100

# Identify the most popular occupation for those who earn >50K in India.
top_IN_occupation = df.loc[(df['salary']=='<=50K') & (df['native-country']=='India'),'occupation'].value_counts().index.tolist()[0]

# DO NOT MODIFY BELOW THIS LINE

if print_data:
    print("Number of each race:\n", race_count) 
    print("Average age of men:", average_age_men)
    print(f"Percentage with Bachelors degrees: {percentage_bachelors}%")
    print(f"Percentage with higher education that earn >50K: {higher_education_rich}%")
    print(f"Percentage without higher education that earn >50K: {lower_education_rich}%")
    print(f"Min work time: {min_work_hours} hours/week")
    print(f"Percentage of rich among those who work fewest hours: {rich_percentage}%")
    print("Country with highest percentage of rich:", highest_earning_country)
    print(f"Highest percentage of rich people in country: {highest_earning_country_percentage}%")
    print("Top occupations in India:", top_IN_occupation)

return {
    'race_count': race_count,
    'average_age_men': average_age_men,
    'percentage_bachelors': percentage_bachelors,
    'higher_education_rich': higher_education_rich,
    'lower_education_rich': lower_education_rich,
    'min_work_hours': min_work_hours,
    'rich_percentage': rich_percentage,
    'highest_earning_country': highest_earning_country,
    'highest_earning_country_percentage':
    highest_earning_country_percentage,
    'top_IN_occupation': top_IN_occupation
}

Blockquote
It returns:

Blockquote
FFFFF.F.F.

The problem seems to rely on the decimal evaluation.

jeremy.a.gray · April 9, 2021, 8:36pm

So you need to look at the errors that are displayed by the test:

Average age of men: 39.43354749885268
...
======================================================================
ERROR: test_average_age_men (test_module.DemographicAnalyzerTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gray/src/work/fcc-da-demographics/test_module.py", line 24, in test_average_age_men
    self.assertAlmostEqual(
  File "/usr/lib/python3.9/unittest/case.py", line 876, in assertAlmostEqual
    if round(diff, places) == 0:
TypeError: an integer is required (got type str)

This is fairly straightforward. The test wants an integer (actually one of the functions in the test; the test really wants a number, but whatever), and it got a string. You can see this if you look at the test’s code:

    def test_average_age_men(self):
        actual = self.data["average_age_men"]
        expected = 39.4
        self.assertAlmostEqual(
            actual, expected, "Expected different value for average age of men."
        )

So the test really expects the number 39.4 in average_age_men, and not a string or a long decimal. So you need to change your calculation of average_age_men so that it returns 39.4.

The other tests are similarly diagnosed. Good luck.

fkrell · April 9, 2021, 8:44pm

Yes, I know that. Don’t really know how to change that.

Jagaya · April 9, 2021, 9:10pm

Have you tried rounding the numbers?

dmoneyballer · April 9, 2021, 10:33pm

to start finding out how to do something you need to start googling. Knowing what to Google is half of the battle. For this, I see that the test is expecting a str and is getting an int. I googled: convert str to int in df
I found a bit of code

df['DataFrame Column'] = df['DataFrame Column'].astype(int)

you would need to change the DataFrameColumns to what the column name is and it should convert every row of the df (dataframe) to an integer.

dmoneyballer · April 9, 2021, 10:34pm

sorry Jeremy meant to reply to the original post.

fkrell · April 11, 2021, 5:16pm

Yes, thank you, I’m new to the python (And coding outside matlab) world and getting used to it, to google you have to know what you are looking for jaja

system · October 11, 2021, 5:17am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Demographic Data Analyzer Error Python	4	874	December 11, 2021
Demographic Data Analyzer code works but error Python	5	2138	June 1, 2021
TypeError: 'str' object cannot be interpreted as an integer Python	4	2562	June 1, 2021
Data Analysis with Python Projects - Demographic Data Analyzer Python	3	439	September 20, 2023
Problem with Demographic Data Analyzer Python	4	683	April 9, 2022

Demographic Data

Related topics