Progress in my Data Science Journey

    I want to detail different resources I used to get to where I am in my data science journey and also tell a little about where I am at in my journey right now.

Resources

    I started learning about machine learning about 2 years ago through the Machine Learning Crash Course by Google. I would I highly recommend looking at it as a first look into the world of machine learning, but I would not spend a lot of time completing it, as I don't feel like it made me feel that confident in building things with machine learning.

    Before I started into another course on data science I would read a lot of white papers on arxiv.org. Here my interested was piqued, because I got to read about cool stuff that I found deeply interesting like machine learning in quantum computers. 

    Most of the best white papers I found on arxiv were recommendations from people on Twitter. I would highly recommend if you are starting out in data science that you try to find good, encouraging voices in the data science community. Some of my favorites are Vladimir Haltakov, an engineer named Santiago, Tivadar Danka, and Alejandro Piad Morffis

    As I spent more time looking into machine learning, I kept hearing the website kaggle.com being referenced for machine learning competitions. I found that there was several machine learning courses on the website, and I enjoyed them a lot! I think they were an extremely good primer for the nitty gritty technical aspects of machine learning as they lined out how to perform transformations of data and how to implement many machine learning models.

    When I got serious about learning data science I began reading the 2nd edition of Hands-On Machine Learning by Aurélion Géron. This really helped to learn calculus, statistics, and linear algebra needed to develop machine learning systems. I would highly recommend this book! Definitely follow along with the code and do the training exercises, as they are extremely helpful.

    As I got further along in the book and had spent some time talking to Vladimir Haltakov on twitter about computer vision and self-driving technologies he recommended several online courses to me including Andrew Ng's course on deep learning, Deep Learning for CV, Classical CV, Tensorflow docs, a Pytorch course, OpenCV docs, and the German Road Sign detection competition. Thus far I have completed about half of the classical CV course and all of Andrew Ng's deep learning course. I would highly recommend Andrew Ng's deep learning course! He is a great teacher and he makes the math involved easy to understand. It's not only good as a primer for working with deep learning algorithms, but also he gives coding exercises that you can look back over and apply the principles in.

Interactions 

    Like I said before, community interactions on Twitter are KEY to being able to keep up with advances in technology around data science (because research is getting more and more advanced all the time). Also, I have learned that spending time speaking to people that are a lot smarter than me has adjusted my perspective and made me view problems in data science as a lot more accessible. For instance, before speaking with Vladimir Haltakov on Twitter, I would have thought that building self-driving car systems is incomprehensibly technical. But thinking step by step through some of the tasks involved (object recognition, sign detection, data imbalances, proximity detection) makes the task seem a lot more humanly feasible.

    Here are a few of my favorite Twitter threads that I was involved in:

    More recently I have joined Data Talks Club. This is an online community hosted by Alexey Grigorev that holds a zoom camp for learning data science deployment methods. Also, with their Slack channel you get updates on advances in technology and can ask questions to experienced data scientists. It's super useful!

Good Advice Along the Way

     Some advice that I received along the way that I appreciated was:

    ML is only one side of the things. What is also important is:
- Good programming skills
- Experience with lower level languages, like C++ (this is particularly important if you will be writing code that will run in a car)
- Soft skills like teamwork, avoiding/dealing conflicts etc.

    Bayesian statistics aren't needed for entry-level roles. You only need to know central limit theorem, z-score tests, mean and variance, best fit models, and variance-bias trade-offs.

    The only distributions that matter are normal and log-normal distribution.

    Understanding the algorithms in scikit-learn, especially the linear models, ensemble models, tree models, boosting models, and nearest neighbors model are all important starting out. Also really understand log-loss, mean squared error, and different classification metrics (all under scikit-learn) are also needed in an entry-level position.

    Having a solid portfolio and beautiful graphs (made with Plotly) will add to your presentation.

  All of these helped me to focus my attention on only what was necessary starting out.

Accomplishments

    A few things that I am proud of in my data science journey so far are my contributions on Vladimir Haltakov's ML interview questions twitter series, building a softmax regressor from scratch, participating in various Kaggle competitions, as well as implementing neural style transfer (with a convolutional neural network) to transform photos I had taken from trips to Spain and Peru into the style of my favorite artist, Quentin Monge.


    Now as I am building out my portfolio of projects and looking for a job in data science where I can take my talents, it's exciting to see just how far I've come and just how much I've learned in the journey thus far!

Comments

Popular Posts