Which Machine Learning Algorithm Is Right for You?

You have data and an application, but which algorithm should you try first? There are tradeoffs no matter what you choose; here are some basic principles to get you started.

Size of Your Dataset

Algorithms are very sensitive to the size of your dataset. While there are no absolute rules that dictate which algorithm should be used for datasets under 50 MB or over 1 TB, here are the algorithms you may want to start with given the amount of data you have and assuming your sample dataset is balanced.

 

Small

Small

  • Decision trees
  • Linear models (including logistic regression and linear discriminant)
Small

Small

  • Decision Trees
  • Linear Models (including Logisting Regression Linear Discriminant)
Medium

Medium

  • (Nonlinear) SVM
  • Naïve Bayes
  • Nearest neighbor
  • Neural network (shallow)
Medium 

Medium

  • (Nonlinear) SVM
  • Naive Bayes
  • Nearest Neighbor
  • Neural Network (Shallow)
Large

Large

  • Deep nets
  • Ensembles
Large

Large

  • Deep Nets
  • Ensembles