
Advantages

Disadvantages

Linear SVM

By introducing the kernel, SVMs gain flexibility in the choice of the form of the threshold separating samples from different classes, which needs not be linear and even needs not have the same functional form for all data, since its function is nonparametric and operates locally.

The lack of transparency of the results.


Since the kernel implicitly contains a nonlinear transformation, no assumptions about the functional form of the transformation, which makes data linearly separable, is necessary.

The SVM moves the problem of overfitting from optimizing the parameters to model selection.


SVMs provide a good outofsample generalization, if the parameters (C for example) are appropriately chosen. This means that, by choosing an appropriate generalization grade, SVMs can be robust, even when the training sample has some bias.
 

SVMs deliver a unique solution, since the optimality problem is convex.
 
RF

It decides the final classification by voting, decreasing the variance of the model without increasing the bias.

It is hard to visualize the model or understand why it predicted something, as compared to a single decision tree.


It uses a random subset of features at each node of the decision trees, to identify the best split among this subset, and the subsets are different in each node. This is to avoid the most powerful features being selected too frequently in each tree, making them more correlated to each other.

A large number of trees may make the algorithm slow for realtime prediction.


It is fast even on large datasets.

RFs have been observed to overfit for some datasets with noisy classification/regression tasks.


It gives estimates of what variables are important in the classification.
 
KNN

The cost of the learning process is zero.

The algorithm must compute the distance and sort all the training data at each prediction, which can be slow if there are a large number of training examples.


No assumptions about the characteristics of the concepts to learn have to be done.

The algorithm does not learn anything from the training data, which can result in the algorithm not generalizing well and also not being robust to noisy data.


Complex concepts can be learned by local approximation using simple procedures.

Changing k can change the resulting predicted class label.
