Hi guys,
I need help to understand what the challenge wants me to do here :
Clean the data. Filter out the following patient segments that represent incorrect data:
diastolic pressure is higher than systolic (Keep the correct data with (df['ap_lo'] <= df['ap_hi']))
height is less than the 2.5th percentile (Keep the correct data with (df['height'] >= df['height'].quantile(0.025)))
height is more than the 97.5th percentile
weight is less than the 2.5th percentile
weight is more than the 97.5th percentile
Am I supposed to remove those values from the dataframe ?
For example in :
“height is more then the 97.5th percentile” → Am I supposed to remove height values where the height is above 97.5% of the percentile?
“weight is less than the 2.5th percentile” → remove weight that is less than 2.5% or keep?
So, if the 2.5% percentile is 150cm and the 97.5% percentile is 180cm, we should remove the values inferior or equal to 150cm and remove the values above 180cm.
Correct ?
The exercise isn’t clear.