Project Demographic Data Analyzer

rajgorakshay · July 22, 2020, 10:23pm

Hi,
I loaded the dataframe and now trying to calculate count on 'Race" column using Pandas Series. Is the following going to work? or well is the below correct?

df = pd.read_csv(‘adult.data.csv’)

How many of each race are represented in this dataset? This should be a Pandas series with race names as the index labels.

race_count = df.groupby('race')
race_count.set_index('race')
race_count['race'].count()

rajgorakshay · July 23, 2020, 1:07am

Hi once again,

Can someone please help me with the calculation of "highest percentage of rich " and highest percentage of rich people in country for the Demographic Analyzer project?
My code so far:

import pandas as pd

def calculate_demographic_data(print_data=True):
# Read data from file
df = pd.read_csv(‘adult.data.csv’)

# How many of each race are represented in this dataset? This should be a Pandas series with race names as the index labels.
race_count = df.groupby('race')
race_count['race'].count()


# What is the average age of men?
average_age_men = df[df['sex'] == 'Male']['age'].mean()

# What is the percentage of people who have a Bachelor's degree?
percentage_bachelors = df[df['education'] == 'Bachelors'].shape[0] / df.shape[0] * 100

# What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
# What percentage of people without advanced education make more than 50K?

# with and without `Bachelors`, `Masters`, or `Doctorate`
higher_education = df[df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]
lower_education = df[~df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]

# percentage with salary >50K
higher_education_rich = higher_education[higher_education['salary'] == '>50K']['salary'].count() / higher_education.shape[0] * 100

lower_education_rich = lower_education[lower_education['salary'] == '<50K']['salary'].count() / lower_education.shape[0] * 100

# What is the minimum number of hours a person works per week (hours-per-week feature)?
min_work_hours = df['hours-per-week'].min()

# What percentage of the people who work the minimum number of hours per week have a salary of >50K?
num_min_workers = df[df['hours-per-week'] == 1]['hours-per-week'].count()

rich_percentage = df[(df['hours-per-week'] == 1) & (df['salary'] == '>50K')].shape[0] / num_min_workers

# What country has the highest percentage of people that earn >50K?
highest_earning_country = df[df['salary'] == '>50K'].groupby('native-country')['native-country'].count().max()

highest_earning_country_percentage = (highest_earning_country / df.groupby('native-country')['native-country'].count()).max()

# Identify the most popular occupation for those who earn >50K in India.
top_IN_occupation = df[(df['native-country'] == 'India') & (df['salary'] == '>50K')].groupby('occupation')['occupation'].count().max()

rajgorakshay · July 23, 2020, 1:10am

chrisrajt · July 27, 2020, 8:47am

higher_education_rich = (df[(df[‘education-num’].isin([13, 14, 16])) & (df[‘salary’] == ‘>50K’)][‘salary’].count()/higher_education).round(3)*100

lower_education_rich = (df[(~df[‘education-num’].isin([13, 14, 16])) & (df[‘salary’] == ‘>50K’)][‘salary’].count()/lower_education).round(3)*100

find codes for higher_education_rich & lower_education_rich

I am still finding code for race_count

Sabretooth · September 15, 2020, 7:46am

Can you explain the code to finding the highest earning country? i ran your code on jupyter notebook, it returns just a number 7171. Can you help me out with that?

On a side note, for that question regarding race, here is your code:
race_count = df.groupby([‘race’])[‘race’].count()

gloriakrstiani · December 5, 2020, 6:31am

Use idxmax() instead of max() at the end to return the index name.

Topic		Replies	Views
Demographic Data Analyzer	13	6351	April 23, 2021
Data Analysis with Python Projects - Demographic Data Analyzer Python	3	457	September 20, 2023
Data Analysis with Python Projects - Demographic Data Analyzer Python	3	188	June 29, 2024
Demographic_data_analyzer.py Python	1	10	December 7, 2025
Problem with Demographic Data Analyzer Python	4	702	April 9, 2022

Project Demographic Data Analyzer

How many of each race are represented in this dataset? This should be a Pandas series with race names as the index labels.

Related topics