Demographic data analyzer problem

Hi, I’m working on project 2 of the data analysis certificate but I have a problem. I can’t understand what I’m wrong with calculating the percentage of people who have high education like bachelors masters and doctorate. here is my code:

higher_education_rich_1=df['education'].loc[(df['education']=='Bachelors')|(df['education']=='Masters')|(df['education']=='Doctorate')]
    salary=df['salary'].loc[df['salary']=='>50K']
    higher_education_rich=higher_education_rich_1.loc[higher_education_rich_1.index.isin(salary.index)==True].value_counts().sum()/len(df)*100
    lower_education_rich_1=df['education'].loc[~(df['education']=='Bachelors')|(df['education']=='Masters')|(df['education']=='Doctorate')]

Hi there. Can you please reformat your code so it’s easier to read? You would have to wrap your code with three of these symbol `, three at the top and three at the bottom. Adding that and proper spacing will go a long way to help us make sense of what you wrote.

Hope this helps. :slight_smile:

of course excuse me

'higher_education_rich_1=df[‘education’].loc[(df[‘education’]==‘Bachelors’)|(df[‘education’]==‘Masters’)|(df[‘education’]==‘Doctorate’)]

salary=df[‘salary’].loc[df[‘salary’]==‘>50K’]

higher_education_rich=higher_education_rich_1.loc[higher_education_rich_1.index.isin(salary.index)==True].value_counts().sum()/len(df)*100’

1 Like

When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.

You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.

See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (').

higher_education_rich_1=df[‘education’].loc[(df[‘education’]==‘Bachelors’)|(df[‘education’]==‘Masters’)|(df[‘education’]==‘Doctorate’)]

salary=df[‘salary’].loc[df[‘salary’]==‘>50K’]

higher_education_rich=higher_education_rich_1.loc[higher_education_rich_1.index.isin(salary.index)==True].value_counts().sum()/len(df)*100

I’ve edited your code for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.

You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.

See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (').

1 Like

ok thanks you. I don’t know it

What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

You have calculated what percentage of the total group have both higher education and make more than 50k, since you are dividing by the original dataframe /len(df)*100

higher_education_rich=higher_education_rich_1.loc[higher_education_rich_1.index.isin(salary.index)==True].value_counts().sum()/len(df)*100
1 Like

Is it wrong? I don’t understand

What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

This would be:
(advanced education & 50K) / (advanced education)

You calculated:
(advanced education & 50K) / (Total Group)

1 Like

Oh thank you very much I didn’t understand that I had to divide not for the entire dataframe or the education column but for those values you correspond to high education. Thank you so much, I was just missing this step to finish the project. The low education I had managed to do it. Actually my first approach had been to create a crosstab as a percentage of the education and salary column and then select the higher education and the salary above 50K and then make the sum but it didn’t work. I then found a question here on the forum where a person had the same problem as me and was told not to group the data but to use the approach then that I used. Thank you very much. It was the only thing I didn’t understand.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.