05 November 2017

[Data Science] So I Have Just Completed the Applied Data Science with Python Specialisation by University of Michigan in Coursera

This specialisation comprises of 5 courses:
Course 1 - Introduction to Data Science in Python
Course 2 - Applied Plotting,  Charting & Data Representation in Python
Course 3 - Applied Machine Learning in Python
Course 4 - Applied Text Mining in Python
Course 5 - Applied Social Network Analysis in Python
I shall summarise what I think about the course:

The Good:
1) The course is structured very well and the pace manageable.
2) All exercises are done and submitted in Jupyter notebookes and graded rather quickly. 
3) Exercises are challenging. Learners have to do much research (through Stack Overflow forums) and the toolkit's documentation pages. 
4) Personally, I learned more about Pandas and Numpy and also performing neural networks using Scikit-Learn MLP (stands for multi-layer perceptron) module.

The Bad:
1) While the course is structured, the materials are administered quickly. It is iterated many times that it is an applied course, and hence much of the intuition behind the methods were just introduced quckly.  
2) Stack Overflow, Stack Overflow.... 
3) The assignment grading can be buggy, and with no expected answers provided, the learner just have to submit and see if they passed. I think such is life as a data scientist? We never know if we get the correct answers. So I suppose that is intent? But much time is wasted just by waiting. 
4) Nothing to do with the course, but I do not quite like the Coursera charging a fee monthly instead of a lump sum. This makes the course very costly in the long run. For example, if this course has 5 modules and presumably each module is a mths's of content, it will cost well above USD$200 to get the specialization. Of course there is an option to just audit the courses.

Who Should Take This Course:

I think this is a course worth taking, afterall the University of Michigan is not a trivial name, but perhaps not spent too much lingering on. So I think it is best to learn the basics of Python, data science and machine learning throught other more cost-effective (by which, I mean cheaper) means.

Here are what I would recommend:

For Python/Data Science
1) I find the book Automate the Boring Stuff with Python is a good start, although I discovered this rather late. This book introduces all the basic programming in Python, such as creating functions, iterations, and also more advanced stuff like reading and automating spreadsheets, webcrawling, and generally automating work using Python. It is easy to read and follow, and the digital copy of the book is free to browse at the link provided.   
2) Jose Portilla's courses pertaining to Python and Data Science in Udemy is an excellent introduction to the subject. And with the constant barrage of sales and offer at Udemy (recently they have a $10 to all courses, for example), it is good to buy and learn at your own pace. I have personally done the Data Science and Machine Bootcamp with Python (and also SQL and R). He also have a Python bootcamp.  
3) Other resources such as Datacamp and Codeacademy is a good place to consider too. Personally, I started Python and R at Codeacademy and Datacamp respectively.

For Machine Learning

1) The videos based on the book An Introduction to Statistical Learning with Applications in R is an must-watch if you wish to have a deeper understanding in the intuition behind the machine learning algorithms, in a statistician point-of-view. It also covers more advanced methods such as Support Vector Machines. However, it does not cover neural networks. Statistical Learning is the statistician way of saying Machine Learning.

2) Andrew Ng has become the household name for Machine Learning. His Machine Learning course in Coursera is becoming a classic. This course approaches Machine Learning in the Computer Science's POV and Andrew is able to explain difficult concepts easy to understand in his softspoken manner. The assignment is challenging and it is in Matlab or Octave. It is this course that I finally appreciate the wonders of matrix algebra (and yes, I am not shy to say I am an engineer). I am currently auditing his Deep Learning course, also in Coursera.


Currently I am still looking for opportunities to use what I have learnt at work (or new opportunities). I am using Python to scrape stock data and am devising a project to scrape financial report data for REIT analysis. More on this in future blogs.

~yZhifa