Thoughts on the D3 scatter plot [solved]

At the risk of pissing more people off…

I found the scatter plot assignment a bit odd. It wasn’t a very good use of a scatter plot. The purpose of a scatter plot is to show the relationship between two variables. But for the data set given there is no doubt of the relationship between how fast they did that part of the race and what their rank was compared to others that did that part of the race. It is a simple ordinal relationship – the person with the lowest time will always be first, the person with the second lowest time would always be second, etc. Really, the scatter plot in the original is just a sorted horizontal bar chart without the bars.

Again, this isn’t a dig on the coding. Just as someone that used to do statistical work, this struck me as odd – using the wrong tool for the job. I’d rather see something like performance vs. age or performance vs. weight or by rank vs. hours trained or time vs. year of the race, etc. I’d want to see something where the relationship isn’t already clear by definition. That’s what scatter plots are for.

It just struck me as odd. I guess it doesn’t matter from a coding standpoint. I didn’t have the energy to dig up a data set with race times that could be put into a meaningful scatter plot, but I probably will before I put that into a portfolio.

Just a thought.

That particular plot also struck me as a bit odd when I worked on it a couple of weeks ago because of what you said. I ended up spending an hour looking for another set of data that I felt like plotting, failed to find anything, and ended up using the dataset provided in the end so that I wouldn’t lose momentum working through the curriculum (I clearly also failed to raise an issue about it on GitHub).

I don’t think you should worry too much about pissing people off. In my opinion, yours are valuable opinions that can be used to improve the curriculum for everyone.

There may be suitable data sets that that we can extract from the 2017 New Coder Survey for plotting. The sample size for US-based respondants is about 7000, which will likely afford some usable data. I’m more than happy to help processing the data to see what we can get. If that sounds reasonable, would you be okay with raising an Issue about this on GitHub? :slight_smile:

Issues can be raised in the github issues for things like this and in fact it is already being addressed. The new data will contain rider weight in kg as well as time, year and doping data. There will be many ways to visualize the data.

1 Like

Sorry, I should have looked more closely at the beta version instead of glancing quickly and seeing something similar and assuming that it was the same. My mistake. Bad coder! :frowning2::

1 Like

no worries, i’m happy that you were wiling to provide the feedback! I am improving the dataset for that problem b/c the scatterplot of year/time is not very interesting to me as a cycling fan, but weight/time would be appropriate and interesting. So i’m adding weight ot the data sets.

1 Like