Demographic Data Analyzer ~ Higher/lower education Percentage miscalculation

Im currently trying to answer the following question(s) within the project mentioned above and I cant quite seem to get the percentages correct:

  • What percentage of people without advanced education make more than 50K?
  • What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

This is what I have at the moment:

    # What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
    higher_education = df[['education', 'salary']].loc[(df['education']== 'Bachelors') & (df['salary'] == '>50K') | ((df['education']== 'Masters') & (df['salary'] == '>50K')) | ((df['education']== 'Doctorate') & (df['salary'] == '>50K'))].value_counts().sum()  
   

   # Equals 44.59 for some reason
    higher_education_rich = (higher_education / df['salary'].loc[(df['salary'] == '>50K')].value_counts().sum() )* 100
    
    #What percentage of people without advanced education make more than 50K?
    lower_education = df[['education', 'salary']].loc[
     ((df['education']== '1st-4th') & (df['salary'] == '>50K')) |
     ((df['education']== '5th-6th') & (df['salary'] == '>50K')) |
     ((df['education']== '9th') & (df['salary'] == '>50K')) |
     ((df['education']== '12th') & (df['salary'] == '>50K')) |
     ((df['education']== '7th-8th') & (df['salary'] == '>50K')) |
     ((df['education']== '11th') & (df['salary'] == '>50K')) |
     ((df['education']== '10th') & (df['salary'] == '>50K')) |
     ((df['education']== 'HS-grad') & (df['salary'] == '>50K'))].value_counts().sum()  
    
    print("Num of people making >50K with advanced education: ", higher_education)
    print("Num of people making >50K with no  advanced education: ", lower_education)
    print("Total num of people making salary: ", df['salary'].count())
    print('Percentage of salary that make >50K per educational standing: ', df[['salary', 'education']].loc[df['salary']=='>50K'].value_counts(normalize =True)*100)

    # percentage with salary >50K; Answer=17.4
    lower_education_rich = None

My question for you guys is what falls under the “advanced education” qualifications? as when I am computing just the ones mentioned I get 44.59% so I am thinking that either I am not including a category that is qualifies under “advanced education” or Im not understanding the question fully. I also would like to know if I am taking the right approach for calculating lower_education as well. Like do I include “some college”, “Assoc-acdm”, “Assoc-voc”, “Prof-school” in my “lower_education” calculation? Any and all feedback on the topic at hand would be greatly appreciated.

Taking a brief look, it looks to me, the higher_education_rich here is calculating the percentage of people having advanced education among those making more than 50K. Instead of percentage of people making more than 50K among those with advanced education.

1 Like

Okay then so should I include those people I mentioned above or should I just stick with the people that the specify?

You have to do what the task said - you cannot randomly include/exclude people.

Though you are given the education-levels that are considered “high” → why not use this knowledge to say whoever hasn’t one of these 3 is “not high” instead of listing all other values?

Also for chaining conditions, instead of checking every single time for the salary, maybe check for the salary first and then for the education.

If I want to select all people who are named Jon Doe or Jon Wick, I can go
df[df.FirstName == "Jon" & (df.SecondName == "Doe" | df.SecondName == "Wick")]
or because the condition creates a new dataframe
df[df.FirstName == "Jon"][df.SecondName == "Doe" | df.SecondName == "Wick"]