Data Science is one of the most sought after jobs on the job market. But is this still the case? Or is there already a more desirable one?
There is! Machine learning engineering is overtaking data science in the job market.
In this article, I want to shed light on why machine learning engineering is overtaking data science in my opinion and how you can start learning it.
But let’s first start with understanding the difference between both job roles.
Machine Learning Engineer vs. Data Scientist
Within the same organisation or company, machine learning engineers will have more responsibility than data scientists. A data scientist’s job is to do just that: analyse data and draw conclusions. A machine learning engineer’s duties will centre on the development and maintenance of machine learning software.
To further disentangle these distinctions, we might examine the stages of a data science project:
A data scientist’s typical workday consists of creating, training, and assessing some sort of model. Once complete, the machine learning engineer will put the model into production and see to its upkeep. In order to make money off of the model, the machine learning engineer incorporates it into a product.
Isn’t it the case, nevertheless, that both positions are equally essential? I agree, they are. However, firms have already begun hiring data scientists in large numbers, as they are primarily in the “modelling and exploration” stage. As a result of the growing requirement to deploy these models in production and derive value from them, the demand for machine learning engineers has skyrocketed.
One study found that “87% of data science projects never make it into production,” citing an article from Venture Beat . This is because there aren’t enough skilled machine learning engineers being hired. This discrepancy demonstrates that businesses are putting greater emphasis (as they should be) on recruiting machine learning experts capable of deploying these models in the wild.
The disparity is also seen in the job listings on Glassdoor. There are presently 3345 open positions for machine learning engineers in California, whereas only 1809 are open for data scientists. In other words, the demand for machine learning engineers has increased by a factor of nearly two.
The question then becomes why a data scientist can’t just as easily learn how to deploy models in production. As opposed to the entire ML infrastructure, the data scientist tends to zero in on the ML code itself (Figure 2). Indeed, the data scientist’s attention should be limited to that subset alone. Concentrating on the ML code and the infrastructure for deployment, monitoring, etc… would be an impossible task.
To get the most out of your data, it is crucial to have a data scientist and a machine learning engineer on staff.
Alright, so we know that engineers with expertise in machine learning are currently in more demand. However, what expertise does a machine learning engineer need? To work as a machine learning engineer, what skills are necessary?
Path to Become a Machine Learning Engineer
As such, I’d like to devote this section to discussing the tools and knowledge you’ll need to become a machine learning engineer. In addition, I will share resources I used to train for the role of machine learning engineer by providing links to relevant online courses.
WARNING: I only recommend classes in which I have actually enrolled. I don’t get any money off of people clicking on the links I provide. I’m merely passing them along because they’ve been so helpful to me as a student.
Most valuable Skills
- Computer Science Fundamentals and Programming: data structures (stacks, queues, …), algorithms (searching, sorting, …), computability and complexity and computer architecture (memory, cache, bandwidth, …)
- Probability and Statistics: probability, Bayes rule, statistical measures (median, mean, variance, …), distributions (uniform, normal, binomial, …) and analysis methods (ANOVA, hypothesis testing, …)
- Data Modeling and Evaluation: finding useful patterns (correlations, clusters, …) and predicting properties of unseen data points (classification, regression, anomaly detection, …), continuously evaluating model performance with correct performance metric (accuracy, f1-score, …)
- Applying Machine Learning Algorithms and Libraries: choosing correct model for underlying problem (decision tree, nearest neighbor, neural network, ensemble of multiple models, …), learning procedure to train model (linear regression, gradient boosting, …), understand influence of hyperparameters, experience with different ML libraries (Tensorflow, Scikit-learn, PyTorch, …)
- Software Engineering and System Design: understand different system components (REST APIs, databases, queries, …), build interfaces for ML component
Tools to Learn
Now let’s move on to the tools that I think are essential to learn:
- Python: I think this one is clear. Python is still the number one programming language in the field of machine learning , and it is also easy to learn.
- Linux: As a machine learning engineer will work a lot with infrastructure topics, being able to work on Linux is really important.
- Cloud: More and more applications are moving to the cloud. That means that you as a machine learning engineer will probably also deploy the models to a cloud environment. Therefore, I recommend learning to work with at least one of the popular cloud providers (GCP, Azure, AWS). I am currently enrolled in the AWS developer certificate course on Udemy that I can really recommend!
- Docker, Kubernetes: In my opinion, these two tools are a must learn for every machine learning engineer! They are so powerful for easily deploying models into production and creating complete architectures for your applications. I took the Docker and Kubernetes complete guide on Udemy and learned a lot throughout this course!
Other Useful Online Courses
So now that you know what skills are required and what tools to learn, I also want to show you some other helpful online courses that I think can help you on your journey to becoming a machine learning engineer (at least they helped me):
- Deep Learning Specialization by Andrew Ng: This tutorial will teach you the fundamentals of Deep Learning and how to train models for applications as diverse as image classification. Andrew does a fantastic job at clarifying the concept. However, you are not only studying the theory behind machine learning algorithms and frameworks, but you are putting it into practise in practical classes.
- Machine Learning Nanodegree by Udacity: The focus of this Nanodegree from Udacity is on training machine learning models and deploying them into production with the help of AWS SageMaker and other tools. You can also see my Medium article about the Capstone project I made to pass this class. IMPORTANT: Udacity has replaced my course with an updated version of the same course. But I still think there’s a lot of logic in taking part in this revised version.
- IBM Machine Learning Professional Certificate: This Coursera class covers every facet of machine learning and includes extensive lab work. There are several types of machine learning, both supervised and unsupervised, as well as deep learning and reinforcement learning, which you will study. Each course culminates in a Capstone project that necessitates not only the development of a unique application, but also the writing of a detailed report outlining that application and its development.