Demographic Data Analyzer:Grouping and Sorting creates empty dataframe

TrayBanks · July 13, 2021, 2:14pm

I am currently attempting the demographic data analyzer project and I ran into a problem that I cant seem to solve. I am trying to answer the question: What is the most popular occupation in India that makes over 50k. So far I have the following:

#What percentage of people without advanced education make more than 50K?
    lower_education = df[['education', 'salary']].loc[(df['education']=='None')].value_counts()

The following is only printing the series,dtype: int64 and I am wondering why that is given that previous questions I have answered have outputted something other than the data structure info. If someone could explain that to me and tell me that I am going in the right direction as far as syntax that would be greatly appreciated. I think the only thing I have to add is sorting the combined columns by the occupation frequency and then I should be able to use .idxmax() to get the most popular occupation. Let me know if my logic is off as well.

Jagaya · July 14, 2021, 6:41am

Could you provide a link to your project?
I don’t really understand what error / output you are getting and your code snippet doesn’t seem related to your question.

TrayBanks · July 14, 2021, 1:28pm

Jagaya · July 14, 2021, 5:35pm

It’s one of these REALLY annoying errors: salary is written with an uppercase “K”.

TrayBanks · July 15, 2021, 2:04pm

Salary is written with an uppercase K…? can you elaborate on that?

Jagaya · July 15, 2021, 2:33pm

In your linked code, you check if salary is equal to the string “<=50k” or so, which will always be false because salary is either “<=50K” or “>50K” - written with an uppercase K.
So if the strings are always different, the condition will always be false and your selection will return an empty series-object.

Meanwhile in your code snipped of the first post, you check if education would be equal to the string “None” - which it never is. Hence it also creates an empty series-object, which is why it’s not giving anything but the data-structure info.

TrayBanks · July 15, 2021, 2:52pm

WOOWWW, big oof on my part thanks for catching that mistake. After correcting the issue I was able to get it working however upon looking at the query it looks like the answer to the question is Exec-managerial with 1968 occurrences but the correct answer according to the test units expected entry is Prof-specialty with 1859 occurrences. Do you happen to know why that is? Could it be a typo on the unit tests part?

Jagaya · July 15, 2021, 3:51pm

The last thing you should ever expect is a failure in the provided code

Task: Identify the most popular occupation for those who earn >50K in India.
Your didn’t filter for India

system · January 14, 2022, 3:52am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.