Speeding Up Data Preprocessing

for Machine Learning

This ebook demonstrates the tasks involved to preprocess data used for machine learning algorithms in MATLAB®.


If you have worked with machine learning, you know that you must preprocess the data. You also know that this can be a tedious process when handled manually. You need to be able to make updates to preprocessing scripts within a framework that allows you to quickly evaluate the impact of changes on the accuracy of the machine learning model.


If you think back to early math classes, there were clear rules on the order of operations (PEMDAS), remembered with mnemonics like "Please Excuse My Dear Aunt Sally." Whether your type of problem concerns apples or mortgage rates, you know that 2 * (6+4)2 = 200.


The order of operations for data preprocessing tasks is not so straightforward. There are few hard and fast rules when it comes to what order you should do tasks, but every problem has its own factors that may affect what comes first.


You can speed up manual data preprocessing with high-level tools, visualizations, domain-specific tools and apps, and Live Editor tasks in MATLAB. See how to apply these tools to tabular and time-series (including signal) data.