非线性回归模型(part2)--支持向量机-CFANZ编程社区

学习笔记，仅供参考，有错必纠

PS : 本BLOG采用中英混合模式

非线性回归模型

支持向量机

SVMs are a class of powerful, highly ﬂexible modeling techniques.

For regression, we follow Smola (1996) and Drucker et al. (1997) and motivate this technique inthe framework of robust regression(稳健回归) where we seek to minimize the eﬀect of outliers(最小化异常值的影响) on the regression equations.

Also, there are several ﬂavors of support vector regression and we focus on one particular technique called 非线性回归模型(part2)--支持向量机_临界值 -insensitive regression(敏感回归).

Recall that linear regression seeks to ﬁnd parameter estimates that minimize SSE(最小化SSE).One drawback of minimizing SSE is that the parameter estimates can be inﬂuenced by just one observation that falls far from the overall trend in the data.

为了缓解这个问题，SVM会让用户设定一个临界值非线性回归模型(part2)--支持向量机_临界值，那些残差位于临界值内的点将不对回归模型做出贡献，而残差的绝对值超过该临界值的点将对模型做出线性比例的贡献。

There are several consequences to this approach.

First, since the squared residuals are not used(残差的平方将不被使用), large outliers have a limited eﬀect on the regression equation.

Second, samples that the model ﬁts well (the residuals are small) have no eﬀect on the regression equation.

In fact, if the threshold(临界值) is set to a relatively large value, then the outliers are the only points that deﬁne the regression line!

This is somewhat counterintuitive(这看起来有点反直觉): the poorly predicted points deﬁne the line(那些预测效果不佳的点定义了回归直线). However, this approach has been shown to be very eﬀective in deﬁning the model.

非线性回归模型(part2)--支持向量机_支持向量机_04

为了估计模型的参数，SVM使用了上图的非线性回归模型(part2)--支持向量机_临界值损失函数(横轴为残差,纵轴为贡献)，同时还增加了惩罚项。SVM的回归系数将最小化：
非线性回归模型(part2)--支持向量机_核函数_06

其中非线性回归模型(part2)--支持向量机_支持向量机_07 为非线性回归模型(part2)--支持向量机_临界值不敏感函数，Cost参数是由用户设置的代价惩罚，它惩罚大的残差项。

Recall that the simple linear regression model predicted new samples using linear combinations of the data and parameters. For a new sample, 非线性回归模型(part2)--支持向量机_机器学习_09 , the prediction equation is:
非线性回归模型(part2)--支持向量机_数据_10
The linear support vector machine prediction function is very similar. The parameter estimates can be written as functions of a set ofunknown parameters ( 非线性回归模型(part2)--支持向量机_核函数_11 ) and the training set data points (一系列未知参数和训练集数据点的函数)so that:
非线性回归模型(part2)--支持向量机_数据_12

There are several aspects of this equation worth pointing out.

First, there are as many 非线性回归模型(part2)--支持向量机_临界值_13 parameters as there are data points.(参数的个数与数据点个数相同).From the standpoint of classical regression modeling, this model would be considered overparameterized(过度参数化);typically, it is better to estimate fewer parameters than data points(参数个数应该小于数据点个数).

However, the use of the cost value eﬀectively regularizes the model to help alleviate this problem.(模型使用的代价函数可以有效的对模型进行正则化，从而减轻这一问题)

Second, the individual training set data points (the 非线性回归模型(part2)--支持向量机_临界值_15 ) are required for new predictions(训练集中的每个数据点都被用于预测值的计算). When the training set is large, this makes the prediction equations less compact(不候简约) than other techniques. However, for some percentage of the training set samples, the 非线性回归模型(part2)--支持向量机_核函数_11 parameters will be exactly zero, indicating that they have no impact on the prediction equation. The data points associated with an parameter of zero are the training set samples that are within ± of the regression line (are within the “funnel” or “tube” around the regression line). As a consequence, only a subset of training set data points, where 非线性回归模型(part2)--支持向量机_数据_18 , are needed for prediction.

Since the regression line is determined using these samples, they are called the support vectors as they
support the regression line.(由于回归线是由这些观测决定的，因此他们被称为支持向量，原因是他们支撑起了最终的回归线)

新样本点进入预测函数的形式是它们与已有数据点叉积的和，在矩阵代数中，这对应了一个点积(即非线性回归模型(part2)--支持向量机_临界值_19 )，这是一个重要的特征，因为这个回归方程可以改写为更一般的形式：
非线性回归模型(part2)--支持向量机_支持向量机_20
其中非线性回归模型(part2)--支持向量机_核函数_21 被称为核函数(kernel function)，当预测变量在模型中是线性时，这个核函数就变为了简单的叉积求和：
非线性回归模型(part2)--支持向量机_核函数_22
然而，还有许多其他类型的核函数可以用于扩展回归模型，并针对预测变量引入非线性函数：
非线性回归模型(part2)--支持向量机_机器学习_23

其中非线性回归模型(part2)--支持向量机_数据_24 和非线性回归模型(part2)--支持向量机_核函数_25 是尺度参数，由于这些预测变量的函数将生成非线性的模型，因此这种推广往往被称为"核方法"(kernel trick).

Which kernel function(核函数) should be used?

This depends on the problem.

When the regression line is truly linear, the linear kernel function(线性核函数) will be a better choice.

Note that some of the kernel functions have extra parameters. For example, the polynomial degree(多项式的阶数) in the polynomial kernel(多项式核函数) must be speciﬁed. Similarly, the radial basis function (径向基函数)has a parameter ( 非线性回归模型(part2)--支持向量机_临界值_13 ) that controls the scale. These parameters, along with the cost value(代价参数), constitute the tuning parameters(调优参数) for the model.

In the case of the radial basis function, there is a possible computational shortcut to estimating the kernel parameter. Caputo et al. (2002) suggested that the parameter can be estimated using combinations of the training set points(训练集样本点的组合) to calculate the distribution of 非线性回归模型(part2)--支持向量机_数据_27 , then use the 10th and 90th percentiles as a range for 非线性回归模型(part2)--支持向量机_临界值_13 .

Instead of tuning this parameter over a grid of candidate values, we can use the midpoint of these two percentiles.(我们可以利用这两个分位点的中点作为参数的估计，而不是在一系列网格点上进行调优)

The cost parameter is the main tool for adjusting the complexity of the model.

When the cost is large, the model becomes very ﬂexible since the eﬀect of errors is ampliﬁed(误差的影响被放大). When the cost is small, the model will stiﬀen(僵硬) and become less likely to over-ﬁt (but more likely to underﬁt) because the contribution of the squared parameters is proportionally large in the modiﬁed error function. (在修改后的误差函数中，参数的平方对误差的贡献将成比例的增大)

One could also tune the model over the size of the funnel( 非线性回归模型(part2)--支持向量机_临界值 ).(建模者还可以针对漏斗的大小进行调优)However, there is a relationship between and the cost parameter. In our experience, we have found that the cost parameter provides more ﬂexibility for tuning the model. So we suggest ﬁxing a value for 非线性回归模型(part2)--支持向量机_临界值 and tuning over the other kernel parameters(固定的取值，对其他核函数的参数进行调优).

Since the predictors enter into the model as the sum of cross products(叉积的和), diﬀerences in the predictor scales can aﬀect the model(预测变量的标度会影响模型). Therefore, we recommend centering and scaling(中心化标准化) the predictors prior to building an SVM model.