Data Analysis with Python Projects - Demographic Data Analyzer

kalei.rachael97 · February 2, 2023, 11:33am

Tell us what’s happening:
I have finished updating my code but when I run the code it tells me, ModuleNotFoundError: No module named ‘pandas’.

Your code so far

import pandas as pd


def calculate_demographic_data(print_data=True):
    # Read data from file
    df = pd.read_csv('adult.data.csv')

    # How many of each race are represented in this dataset? This should be a Pandas series with race names as the index labels.
    race_count = df['race'].value_counts()

    # What is the average age of men?
    average_age_men = round(df.loc[df['sex'] == 'Male', 'age'].mean(), 1)

    # What is the percentage of people who have a Bachelor's degree?
    Bachelors_degree = df.loc[df['education'] == 'Bachelors']
    percentage_bachelors = round((len(Bachelors_degree/df['education.size'])) * 100, 1)

    # What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
    # What percentage of people without advanced education make more than 50K?

    # with and without `Bachelors`, `Masters`, or `Doctorate`
    higher_education = ['Bachelors', 'Masters', 'Doctorate']
    higher_education = df[df['education'].isin(higher_education)]
    
    mask = df['education'].isin(['Bachelors','Masters','Doctorate'])
    lower_education = df[~mask]

    # percentage with salary >50K
    huge_salary = higher_education[higher_education['salary'] == '>50K']
    higher_education_rich = round((len(huge_salary)/len(higher_education)) * 100, 1)

    lower_education_salary = len(lower_education[lower_education['salary'] == '>50K'])
    lower_education_rich = round((lower_education_salary/len(lower_education)) * 100, 1)

    # What is the minimum number of hours a person works per week (hours-per-week feature)?
    min_work_hours = df['hours-per-week'].min()

    # What percentage of the people who work the minimum number of hours per week have a salary of >50K?
    mask = df['hours-per-week'] == 1
    num_min_workers = len(df[mask])
  
    mask1 = (df['hours-per-week'] == 1) & (df['salary'] == '>50K')
    rich_num = len(df[mask1])
    rich_percentage = (rich_num/num_min_workers) * 100

    # What country has the highest percentage of people that earn >50K?
    country_percent = (df.groupby('native-country')['salary'].apply(lambda x: (x == '>50K').mean() * 100))
    highest_earning_country = country_percent.idxmax()
    highest_earning_country_percentage = round(country_percent[highest_earning_country], 1)

    # Identify the most popular occupation for those who earn >50K in India.
    good_earn = df.loc[df['salary'] == '>50K', ['native-country', 'occupation']]
    mask3 = good_earn['native-country'] == 'India'
    india_highest_earners = good_earn[mask3]
    top_IN_occupation = india_highest_earners['occupation'].value_counts().idxmax()

    # DO NOT MODIFY BELOW THIS LINE

    if print_data:
        print("Number of each race:\n", race_count) 
        print("Average age of men:", average_age_men)
        print(f"Percentage with Bachelors degrees: {percentage_bachelors}%")
        print(f"Percentage with higher education that earn >50K: {higher_education_rich}%")
        print(f"Percentage without higher education that earn >50K: {lower_education_rich}%")
        print(f"Min work time: {min_work_hours} hours/week")
        print(f"Percentage of rich among those who work fewest hours: {rich_percentage}%")
        print("Country with highest percentage of rich:", highest_earning_country)
        print(f"Highest percentage of rich people in country: {highest_earning_country_percentage}%")
        print("Top occupations in India:", top_IN_occupation)

    return {
        'race_count': race_count,
        'average_age_men': average_age_men,
        'percentage_bachelors': percentage_bachelors,
        'higher_education_rich': higher_education_rich,
        'lower_education_rich': lower_education_rich,
        'min_work_hours': min_work_hours,
        'rich_percentage': rich_percentage,
        'highest_earning_country': highest_earning_country,
        'highest_earning_country_percentage':
        highest_earning_country_percentage,
        'top_IN_occupation': top_IN_occupation
    }

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36

Challenge: Data Analysis with Python Projects - Demographic Data Analyzer

Link to the challenge:

jeremy.a.gray · February 2, 2023, 12:22pm

Try the solutions from the posts in the forums regarding the python data analysis projects and updating your dependencies on replit.com. If that doesn’t work then you’ll need to post a link to your repl.

kalei.rachael97 · February 2, 2023, 3:04pm

I updated the dependencies and versions and it worked. Thank you.

kumargauravkr · February 7, 2023, 8:33am

Can you tell what changes have you made? I am not able to find it in the forum.

kalei.rachael97 · February 7, 2023, 4:20pm

This is how my pyproject.toml looks like after making changes:

[tool.poetry]
name = “fcc-demographic-data-analyzer”
version = “0.1.0”
description = “”
authors = [“Your Name you@example.com”]

[tool.poetry.dependencies]
python = “^3.8”
pandas = “^1.3.5”

[tool.poetry.dev-dependencies]

[build-system]
requires = [“poetry>=0.12”]
build-backend = “poetry.masonry.api”

kalei.rachael97 · February 7, 2023, 4:23pm

And this is how my poetry.lock looks like:

[[package]]
name = “numpy”
version = “1.24.1”
description = “Fundamental package for array computing in Python”
category = “main”
optional = false
python-versions = “>=3.8”

[[package]]
name = “pandas”
version = “1.3.5”
description = “Powerful data structures for data analysis, time series, and statistics”
category = “main”
optional = false
python-versions = “>=3.7.1”

[package.dependencies]
numpy = [
{version = “>=1.17.3”, markers = “platform_machine != "aarch64" and platform_machine != "arm64" and python_version < "3.10"”},
{version = “>=1.19.2”, markers = “platform_machine == "aarch64" and python_version < "3.10"”},
{version = “>=1.20.0”, markers = “platform_machine == "arm64" and python_version < "3.10"”},
{version = “>=1.21.0”, markers = “python_version >= "3.10"”},
]
python-dateutil = “>=2.7.3”
pytz = “>=2017.3”

[package.extras]
test = [“hypothesis (>=3.58)”, “pytest (>=6.0)”, “pytest-xdist”]

[[package]]
name = “python-dateutil”
version = “2.8.1”
description = “Extensions to the standard Python datetime module”
category = “main”
optional = false
python-versions = “!=3.0.,!=3.1.,!=3.2.*,>=2.7”

[package.dependencies]
six = “>=1.5”

[[package]]
name = “pytz”
version = “2020.1”
description = “World timezone definitions, modern and historical”
category = “main”
optional = false
python-versions = “*”

[[package]]
name = “six”
version = “1.15.0”
description = “Python 2 and 3 compatibility utilities”
category = “main”
optional = false
python-versions = “>=2.7, !=3.0., !=3.1., !=3.2.*”

[metadata]
lock-version = “1.1”
python-versions = “^3.8”
content-hash = “f7d890533084418520a3e3e20ffba5cb5dccc3dfe48a72cb91ec12bab54da358”

[metadata.files]
numpy =
pandas =
python-dateutil =
pytz =
six =

Just compare what you have with this and make the changes and it should work perfectly.

system · August 9, 2023, 4:24am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.