Demographic Data Analyzer ~ Higher/lower education Percentage miscalculation

TrayBanks · July 20, 2021, 4:55pm

Im currently trying to answer the following question(s) within the project mentioned above and I cant quite seem to get the percentages correct:

What percentage of people without advanced education make more than 50K?
What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

This is what I have at the moment:

    # What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
    higher_education = df[['education', 'salary']].loc[(df['education']== 'Bachelors') & (df['salary'] == '>50K') | ((df['education']== 'Masters') & (df['salary'] == '>50K')) | ((df['education']== 'Doctorate') & (df['salary'] == '>50K'))].value_counts().sum()  
   

   # Equals 44.59 for some reason
    higher_education_rich = (higher_education / df['salary'].loc[(df['salary'] == '>50K')].value_counts().sum() )* 100
    
    #What percentage of people without advanced education make more than 50K?
    lower_education = df[['education', 'salary']].loc[
     ((df['education']== '1st-4th') & (df['salary'] == '>50K')) |
     ((df['education']== '5th-6th') & (df['salary'] == '>50K')) |
     ((df['education']== '9th') & (df['salary'] == '>50K')) |
     ((df['education']== '12th') & (df['salary'] == '>50K')) |
     ((df['education']== '7th-8th') & (df['salary'] == '>50K')) |
     ((df['education']== '11th') & (df['salary'] == '>50K')) |
     ((df['education']== '10th') & (df['salary'] == '>50K')) |
     ((df['education']== 'HS-grad') & (df['salary'] == '>50K'))].value_counts().sum()  
    
    print("Num of people making >50K with advanced education: ", higher_education)
    print("Num of people making >50K with no  advanced education: ", lower_education)
    print("Total num of people making salary: ", df['salary'].count())
    print('Percentage of salary that make >50K per educational standing: ', df[['salary', 'education']].loc[df['salary']=='>50K'].value_counts(normalize =True)*100)

    # percentage with salary >50K; Answer=17.4
    lower_education_rich = None

My question for you guys is what falls under the “advanced education” qualifications? as when I am computing just the ones mentioned I get 44.59% so I am thinking that either I am not including a category that is qualifies under “advanced education” or Im not understanding the question fully. I also would like to know if I am taking the right approach for calculating lower_education as well. Like do I include “some college”, “Assoc-acdm”, “Assoc-voc”, “Prof-school” in my “lower_education” calculation? Any and all feedback on the topic at hand would be greatly appreciated.

sanity · July 20, 2021, 5:17pm

Taking a brief look, it looks to me, the higher_education_rich here is calculating the percentage of people having advanced education among those making more than 50K. Instead of percentage of people making more than 50K among those with advanced education.

TrayBanks · July 22, 2021, 1:28pm

Okay then so should I include those people I mentioned above or should I just stick with the people that the specify?

Jagaya · July 22, 2021, 1:38pm

You have to do what the task said - you cannot randomly include/exclude people.

Though you are given the education-levels that are considered “high” → why not use this knowledge to say whoever hasn’t one of these 3 is “not high” instead of listing all other values?

Also for chaining conditions, instead of checking every single time for the salary, maybe check for the salary first and then for the education.

If I want to select all people who are named Jon Doe or Jon Wick, I can go
df[df.FirstName == "Jon" & (df.SecondName == "Doe" | df.SecondName == "Wick")]
or because the condition creates a new dataframe
df[df.FirstName == "Jon"][df.SecondName == "Doe" | df.SecondName == "Wick"]

RorySingleton · December 30, 2021, 7:12pm

Thanks for the helpful advice! Not difficult, but you need to know it. When I was studying for the undergraduate degree, I often conducted such studies. Now on the Internet there are many written help services that can quickly and efficiently help a schoolchild or student, I advise you to check this review on the StudyMoose writing service, which was prepared by experts. You can trust this material, which includes ratings, strengths and features of the service, as well as customer reviews.

Topic		Replies	Views
Data Analysis with Python Projects - Demographic Data Analyzer Python	27	171	January 8, 2025
Demographic data analyzer problem Python	11	429	August 31, 2024
Demographic Data Analyzer Help Python	6	668	August 20, 2021
Demographic-data-analyzer - higher_richer 46.5 vs 46.6 Python	7	221	December 3, 2024
Higher_education_rich & lower_education_rich wronh values Python	3	315	October 24, 2021

Demographic Data Analyzer ~ Higher/lower education Percentage miscalculation

Related topics