A recommended starting point for new data scientists

A recommended starting point for new data scientists
0.0 0


If you are looking on getting into data science/machine learning/artificial intelligence, what you are going to have to learn is going to depend heavily on who you are, how you learn, and what you hope to accomplish. There are debates on what programming languages are better for what tasks, or if they’re necessary at all in the age of the GUI. There are algorithms that are in favor now that researchers are looking to replace with something faster and more accurate. Don’t forget to consider the importance of design in your visualizations, make sure you choose the right chart type for your data and the story you want it to tell. Regardless of what direction you head and how you choose to approach your journey, I would argue that the one objective necessity along the way for anyone looking to enter this field in any capacity is ethics.

With my background (psych major in college, currently data analyst at university psych department), the ethics of research involving human subjects has become so fundamental to me that it’s not even a conscious thought anymore. When the ethics of a project I contributed to was called into question (not long after the Cambridge Analytica scandal that blew up so big and has become a massive source for fear-mongering) I figured it’s time to make it a conscious thought, again.

I know how easy it is to get caught up in the excitement of making discoveries in data and wanting to dig as deep as you possibly can and learn as much as possible. It’s important to keep in mind at all times that, while data science may not have the same legal restrictions as traditional sciences do, it is still science. It’s research and we have an obligation to respect it, and the subjects of our research, as such.

The University of Michigan has a Data Science Ethics course available on edX. I haven’t yet completed it, but so far there has been no programming or analysis involved, just critical thinking. No employer is going to think less of you for not taking the course (out of sight, out of mind), so try and take the lectures seriously and don’t just fly through the course to say that you’ve done it.
The discussion threads in the course start pretty sparse and get more so as the modules progress, but try and participate as much as you can. I’d be thrilled to be a part of any discussions of the course materials in these forums, FreeCodeCamp’s Data Science Gitter channel or on Twitter for anyone who prefers such platforms.

You can read about the course in an interview with the instructor here,or keep tabs on his blog

I would also encourage reading over the community principles and manifesto, sign and subscribe to any updates to either (I’ve received maybe two emails from them including the initial email thanking me for signing, so if you have a fear of your inbox being blown up, that’s not an issue).

And my most important advice that I’ve stated but will restate:
Keep ethics in mind with each project you do. Encourage others to consider the ethics of their projects, recommend the resources I have if they’re not sure what that even means.


Very nice write up! Agreed that data regarding human subjects should be taken with great care and ethically. The topic of ethics and data science have been brought up recently by the former US chief data scientist, Dr. DJ Patil and WIRED magazine. In the financial sector, this has been brought up years ago. All these examples do allude and propose some sort of Hippocratic Oath of sorts to be followed.

Because there is no consensus on this so far (apart from some community generated manifestos, which are great!), one place we can look towards are software engineering ethics (PDF). I suggest this because data scientists these days have to be part software engineer to get their code and such running in the first place! So as a stepping point, here is a Code of Ethics by IEEE that could be followed in the meantime.