Question I’m attempting to answer: Is there a strong relationship between prior arrests and the prediction of recidivism?
Table Data (n=~4,750) has (among many others) the following relevant columns:
Prior Arrests - Values range from 0 to 38
Recidivism Prediction - Values range from 1 (unlikely to go back to jail) to 10 (highly likely to go back to jail)
Recidivism Range - 1-3=Low, 4-6=Medium, and 7-10=High
Ok…my first instinct when looking at this question is to calculate the correlation between the Prior Arrests and Recidivism Prediction columns. I use the =CORREL() function in Excel and come up with a value of .42, at which point I deduce that there is NOT a STRONG correlation between the two.
However, the trainer uses a pivot table with Recidivism Range in the rows (Low, Med, High) and Average of Prior Arrests in the data field. This shows an obvious correlation between the two. Since I couldn’t run the correlation on text values, I then equated a 1 to Low, a 2 to Med, and a 3 to High in a separate column and then re-ran the correlation between this row and Prior Arrests and it showed a correlation of .40.
MY Question: After watching the video, I understand how the presenter came to his conclusion, but I am wondering WHY the =CORREL() function did not produce a higher value, given that there seems to be a strong relationship between the two. Am I thinking of the correlation function in the wrong way? How would I know when to use the correlation function and/or when to use the technique shown by the trainer?