Hi I am about to start my thesis on data science can anyone suggest me any source that would be usefull.A book for instance.

Several books I really like are

  • Applied Predictive Modeling by Max Khun, this book is on line for free but its worth the $50,
  • Elements of Statistical Learning, Trevor, Hastie, this book is also free online,
    There are many others but these are key.
  • There are tons of places for data. You can find clean data, I.e. don’t waste your time building some unique db, use what’s out there. You’ll save yrself time.

Just remember - data science is statistics! Any statistical resources you feel comfortable with will help you. The biggest data science sins I’ve seen come from forgetting core statistical ideas.