Understand the basics of Data Science (DS) and Machine Learning (ML).
Data Science – the practical application of advanced analytics, statistics, machine learning, and the necessary data preparation in a business context. DS is a broad topic that pulls from many areas of expertise including statistics, computer science (CS), data engineering (DE), machine learning (ML), and data visualization. The two key elements of DS are data and a scientific process.
Artificial Intelligence (AI) – an encompassing term for any technique which enables computers to mimic human behavior which can include ML.
Machine Learning (ML) – a subset of AI which use statistical methods to enable machines to improve with experiences. ML can be further divided into two types: supervised and unsupervised learning. Supervised learning builds models with a predefined label; unsupervised learning models have no label or defined outcome.
Descriptive Modeling – using supervised ML to build an interpretable model.
Predictive Analytics – using supervised ML to predict a future outcome.
Prescriptive Analytics – using supervised ML to optimize a next best course of action.
Text Analytics – using ML where the source data is unstructured or semi-structured text, typically from a human or natural language AI.
Model Validation – ensuring that a model represents a data trend without overfitting.
Coding vs Clicking – knowing when to use the most effective ML tool to achieve business outcomes. There is no single right answer; the decision can depend on the business context including human and computing resources.
Be familiar with data science methodologies, particularly CRISP-DM.
Cross-Industry Standard Process for Data Mining (CRISP-DM) is an open-standard process model that describes common approaches used by data mining experts. CRISP-DM avoids big, upfront design by emphasizing an iterative process starting with business understanding.
Be familiar with ML Use Cases.
There are an extremely wide variety of use cases for analytics and machine learning, and it can be useful to think about a few of the different types of uses.
Another way to think of some of the common use cases is the type of ML regardless of vertical. Supervised learning can be classification or regression, while Unsupervised learning could be grouping or anomaly detection. There is also an area specifically for Feature Selection and other helper functions that cross these areas.
Understand how and when to use different Graphics and Visualizations.
They provide a powerful set of tools to help human understanding of data. Graphs can help summarize vast amounts of data and reveal complex patterns and associations. They can also be an effective way to communicate results and reasonings.
Understand the basics of model selection, evaluation, and uses of ML models.
When we perform advanced analytics, we are building and using models.
Know how to evaluate models. It depends on objectives and the type of model. One common model evaluation technique uses the accuracy of a classification model.
Accuracy is simply the number of correct predictions divided by the number of total predictions. Like other performance metrics, there is no universal definition for what is a good accuracy and what is a bad accuracy. It has to be judged in terms of the model alternatives and the value of more accurate predictions.
Be able to identify the value of model interpretation and explanation, as well as the value of model performance improvements. Sometimes the project is not strictly descriptive modeling, or strictly predictive modeling. It’s valuable provide good predictions, to know how good the predictions are, and how they were predicted.
Understand the importance of plans for the model to deployment into production, and how to drive action. Improved predictions don’t necessarily drive value unless they drive action, and they don’t necessarily drive a lot of value unless they drive high-value actions or many actions.
Know various ways to explain predictions. There are a couple of options when attempting to explain predictions for models that are not easily interpretable. One approach is to use a descriptive model trained on the same data as the predictive model. Another approach is to use simulation to explain predictions from the predictive model.
Understand how to interpret model performance including the confidence or variance that the model provides for an individual prediction. No prediction is completely certain, and there are varying levels of certainty.