Data Analysis with Python Projects - Demographic Data Analyzer- test_module.py TypeErrors

“”“Describe what happened
Have tested my latest code in PyScripter several times and I get no errors.
But when I copied and pasted in replit and checked the unindented errors, I get Failure = 10. I copied and pasted the entire error message below.
Solutions tried: I tried to convert the outputs to strings, no luck.
I tried to edit test_module.py in my replit page, it seems to be locked.
I tried to understand what the error means, what is self.data “bing says it’s a class call.” I don’t know why it says index, when it doesn’t seem be calling an index and just trying to replace something with something else.
I’m a beginner and though I’ve used class notation 3 years ago, the concept now escapes me completely :slight_smile:
“””

Console output follows
EEEEEEEEEE

ERROR: test_average_age_men (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 15, in test_average_age_men
actual = self.data[‘average_age_men’]
TypeError: list indices must be integers or slices, not str

==========================================================
ERROR: test_higher_education_rich (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 25, in test_higher_education_rich
actual = self.data[‘higher_education_rich’]
TypeError: list indices must be integers or slices, not str

==========================================================
ERROR: test_highest_earning_country (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 45, in test_highest_earning_country
actual = self.data[‘highest_earning_country’]
TypeError: list indices must be integers or slices, not str

==========================================================
ERROR: test_highest_earning_country_percentage (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 50, in test_highest_earning_country_percentage
actual = self.data[‘highest_earning_country_percentage’]
TypeError: list indices must be integers or slices, not str

==========================================================
ERROR: test_lower_education_rich (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 30, in test_lower_education_rich
actual = self.data[‘lower_education_rich’]
TypeError: list indices must be integers or slices, not str

==========================================================
ERROR: test_min_work_hours (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 35, in test_min_work_hours
actual = self.data[‘min_work_hours’]
TypeError: list indices must be integers or slices, not str

==========================================================
ERROR: test_percentage_bachelors (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 20, in test_percentage_bachelors
actual = self.data[‘percentage_bachelors’]
TypeError: list indices must be integers or slices, not str

==========================================================
ERROR: test_race_count (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 10, in test_race_count
actual = self.data[‘race_count’].tolist()
TypeError: list indices must be integers or slices, not str

==========================================================
ERROR: test_rich_percentage (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 40, in test_rich_percentage
actual = self.data[‘rich_percentage’]
TypeError: list indices must be integers or slices, not str

==========================================================
ERROR: test_top_IN_occupation (test_module.DemographicAnalyzerTestCase)

Traceback (most recent call last):
File “/home/runner/boilerplate-demographic-data-analyzer/test_module.py”, line 55, in test_top_IN_occupation
actual = self.data[‘top_IN_occupation’]
TypeError: list indices must be integers or slices, not str


Ran 10 tests in 0.208s

FAILED (errors=10)

-----------------------------------------------------------------------------------------

Describe your issue in detail here.
Have tested my latest code in PyScripter several times and I get no errors.
But when I copied and pasted in replit and checked the unindented errors, I get Failure = 10. I copied and pasted the entire error message below.
Solutions tried: I tried to convert the outputs to strings, no luck.
I tried to edit test_module.py in my replit page, it seems to be locked.
I tried to understand what the error means, what is self.data “bing says it’s a class call.” I don’t know why it says index, when it doesn’t seem be calling an index and just trying to replace something with something else.
I’m a beginner and though I’ve used class notation 3 years ago, the concept now escapes me completely :slight_smile:

---------------------------------------------------------------------------------------

Your code so far
import pandas as pd
import numpy as np

def calculate_demographic_data(print_data=True):

Read data from file

datafile = pd.read_csv(‘adult.data.csv’)
#datafile = pd.read_csv(‘https://replit.com/@SandyGCabanes/boilerplate-demographic-data-analyzer/adult.data.csv’)

df = pd.DataFrame(datafile)

How many of each race are represented in this dataset? This should be a Pandas series with race names as the index labels.

race_list = list(pd.unique(df[‘race’]))
race_col = list(df[‘race’])
#pass the contents of the race column to a list

#-----------------------------------------------------
race_count =
#Initializing the list to be printed later

i = 0
for i in range(len(race_list)):
count = race_col.count(race_list[i])
race_count.append(count)
i += 1
return (race_count)

#------------------------------------------------------------

What is the average age of men?

menfilter = df[df[‘sex’] == ‘Male’]
average_age_men = list(round(np.mean(menfilter[‘age’]), 1))

#------------------------------------------------------------

What is the percentage of people who have a Bachelor’s degree?

numpeople = len(df)
bachelors_count = np.sum(df[‘education’] == ‘Bachelors’)
percentage_bachelors = list(round((bachelors_count / numpeople) * 100, 1))

#------------------------------------------------------------

What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

BMD = df[(df[‘education’] == ‘Bachelors’) | (df[‘education’] == ‘Masters’) |
(df[‘education’] == ‘Doctorate’)]
BMD_50K = df[(df[‘salary’] == ‘>50K’) & (
(df[‘education’] == ‘Bachelors’) | (df[‘education’] == ‘Masters’)
| (df[‘education’] == ‘Doctorate’))]

#-------------------------------------------------------------

What percentage of people without advanced education make more than 50K?

lowered = df[(df[‘education’] != ‘Bachelors’)
& (df[‘education’] != ‘Masters’) &
(df[‘education’] != ‘Doctorate’)]
lowered_50K = df[(df[‘salary’] == ‘>50K’) & (
(df[‘education’] != ‘Bachelors’) & (df[‘education’] != ‘Masters’)
& (df[‘education’] != ‘Doctorate’))]

with and without Bachelors, Masters, or Doctorate

higher_education = list(BMD)
lower_education = list(lowered)

percentage with salary >50K

higher_education_rich = list(round(len(BMD_50K) / (len(BMD)) * 100, 1))
lower_education_rich = list(round(len(lowered_50K) / (len(lowered)) * 100, 1))

#-------------------------------------------------------------

What is the minimum number of hours a person works per week (hours-per-week feature)?

workhrs_arr = np.array(df[‘hours-per-week’])
min_work_hours = np.min(workhrs_arr)

#-------------------------------------------------------------

What percentage of the people who work the minimum numbmin_work_hours_pd = df[‘hours-per-week’].min()

workhrs_min = df[(df[‘hours-per-week’] == min_work_hours)]
workhrs_min_50K = df[(df[‘hours-per-week’] == min_work_hours)
& (df[‘salary’] == ‘>50K’)]
rich_percentage = list(round(len(workhrs_min_50K) / len(workhrs_min) * 100, 1))

num_min_workers = len(workhrs_min)

#-------------------------------------------------------------

What country has the highest percentage of people that earn >50K?

countrieslist = list(pd.unique(df[‘native-country’]))

Create a list of percentages by using a loop for each country in countrieslist

percents =
i = 0

while i < len(countrieslist):

dfcountry = df[(df['native-country'] == countrieslist[i])]
dfcountry_50K = df[(df['native-country'] == countrieslist[i])
                   & (df['salary'] == '>50K')]
country_percent = round((len(dfcountry_50K)) / (len(dfcountry)) * 100, 1)
percents.append(country_percent)
i = i + 1

#print (percents, ‘updated percents list’)

Get the maximum percentage in the list

highest_earning_country_percentage = list(np.max(percents))

Get the index of that maximum percentage

indexofmaxval = percents.index(highest_earning_country_percentage)

Apply that index to the countrieslist

highest_earning_country = list(countrieslist[indexofmaxval])

#-------------------------------------------------------------

Identify the most popular occupation for those who earn >50K in India.

Filter df those who earn >50K in India

dfIN = df[(df[‘native-country’] == “India”) & (df[‘salary’] == ‘>50K’)]

Create list of occupations in IN_50K

occup_list_IN = list(pd.unique(dfIN[‘occupation’]))

Count the incidence of each occupation in dfIN

countsIN =
i = 0
for i in range(len(occup_list_IN)):
occupation_count = np.sum(dfIN[‘occupation’] == occup_list_IN[i])
countsIN.append(occupation_count)
i = i + 1

#Get the maximum among the list of counts
highest_occup = np.max(countsIN)

Get the index of this highest_occup in countsIN

index_occup = countsIN.index(highest_occup)

Get the corresponding occupation for the highest value

top_IN_occupation = list(occup_list_IN[index_occup])

#-------------------------------------------------------------

DO NOT MODIFY BELOW THIS LINE

if print_data:
print(“Number of each race:\n”, race_count)
print(“Average age of men:”, average_age_men)
print(f"Percentage with Bachelors degrees: {percentage_bachelors}%“)
print(
f"Percentage with higher education that earn >50K: {higher_education_rich}%”
)
print(
f"Percentage without higher education that earn >50K: {lower_education_rich}%"
)
print(f"Min work time: {min_work_hours} hours/week")
print(
f"Percentage of rich among those who work fewest hours: {rich_percentage}%"
)
print(“Country with highest percentage of rich:”, highest_earning_country)
print(
f"Highest percentage of rich people in country: {highest_earning_country_percentage}%"
)
print(“Top occupations in India:”, top_IN_occupation)

return {
‘race_count’: race_count,
‘average_age_men’: average_age_men,
‘percentage_bachelors’: percentage_bachelors,
‘higher_education_rich’: higher_education_rich,
‘lower_education_rich’: lower_education_rich,
‘min_work_hours’: min_work_hours,
‘rich_percentage’: rich_percentage,
‘highest_earning_country’: highest_earning_country,
‘highest_earning_country_percentage’: highest_earning_country_percentage,
‘top_IN_occupation’: top_IN_occupation
}

--------------------------------------------------------------------

Your browser information:

[Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.79]

Challenge: Data Analysis with Python Projects - Demographic Data Analyzer

Link to the challenge:

Please link to your replit?

@pkdvalis - Here you go

From the first problem: This should be a Pandas series with race names as the index labels. (race column)

You are returning lists instead of Pandas objects

1 Like

Took out the lists now. Thanks!

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.