Data Analysis with Python Projects - Demographic Data Analyzer

Tell us what’s happening:
I am struggling to find:
“What country has the highest percentage of people that earn >50K?”
I have found the percentage. What is the best way to select the country with the matching value? I think I got annoyed with how long it took me to figure out the percentage part.
Do I need to apply a label to the percentages somehow so that I can compare it?

Your code so far

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36

Challenge: Data Analysis with Python Projects - Demographic Data Analyzer

Link to the challenge:

1 Like

You have generated a dataframe/series of countries and percentages and used .max() to return the highest value.

If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.max.html

Oh wow! That is really cool. Such a simple solution for something that I could not figure out!
And then from there I can just select the first index of the returned series to get the country without the Salary info!
So since my series contains countries - salaries - percentages, how does idxmax() decide which column to look at? And then once it decides to give you the max percentage, it just treats the remaining 2 columns as indexes?

1 Like

Good question! I only just discovered this while checking the docs for .max() to answer your question. Lucky it mentions .idxmax() in the first few sentences.

If you do a .info() it says it’s a MultiIndex Series, so I think you’re correct, that’s why it returns the Country and the Salary. The remaining column is the value.

<class 'pandas.core.series.Series'>
MultiIndex: 38 entries, ('?', '>50K') to ('Yugoslavia', '>50K')
Series name: None
Non-Null Count  Dtype  
--------------  -----  
38 non-null     float64
dtypes: float64(1)
memory usage: 834.0+ bytes
1 Like