|
|
@@ -1,64 +1,64 @@
|
|
|
### 5.2
|
|
|
-$$\Delta w_i = \eta(y-\hat{y})x_i$$
|
|
|
+$$\Delta w\_i = \eta(y-\hat{y})x\_i$$
|
|
|
[推导]:此处感知机的模型为:
|
|
|
-$$y=f(\sum_{i} w_i x_i - \theta)$$
|
|
|
+$$y=f(\sum\_{i} w\_i x\_i - \theta)$$
|
|
|
将$\theta$看成哑结点后,模型可化简为:
|
|
|
-$$y=f(\sum_{i} w_i x_i)=f(\boldsymbol w^T \boldsymbol x)$$
|
|
|
-其中$f$为阶跃函数。<br>根据《统计学习方法》§2可知,假设误分类点集合为$M$,$\boldsymbol x_i \in M$为误分类点,$\boldsymbol x_i$的真实标签为$y_i$,模型的预测值为$\hat{y_i}$,对于误分类点$\boldsymbol x_i$来说,此时$\boldsymbol w^T \boldsymbol x_i \gt 0,\hat{y_i}=1,y_i=0$或$\boldsymbol w^T \boldsymbol x_i \lt 0,\hat{y_i}=0,y_i=1$,综合考虑两种情形可得:
|
|
|
-$$(\hat{y_i}-y_i)\boldsymbol w \boldsymbol x_i>0$$
|
|
|
+$$y=f(\sum\_{i} w\_i x\_i)=f(\boldsymbol w^T \boldsymbol x)$$
|
|
|
+其中$f$为阶跃函数。<br>根据《统计学习方法》§2可知,假设误分类点集合为$M$,$\boldsymbol x\_i \in M$为误分类点,$\boldsymbol x\_i$的真实标签为$y\_i$,模型的预测值为$\hat{y\_i}$,对于误分类点$\boldsymbol x\_i$来说,此时$\boldsymbol w^T \boldsymbol x\_i \gt 0,\hat{y\_i}=1,y\_i=0$或$\boldsymbol w^T \boldsymbol x\_i \lt 0,\hat{y\_i}=0,y\_i=1$,综合考虑两种情形可得:
|
|
|
+$$(\hat{y\_i}-y\_i)\boldsymbol w \boldsymbol x\_i>0$$
|
|
|
所以可以推得损失函数为:
|
|
|
-$$L(\boldsymbol w)=\sum_{\boldsymbol x_i \in M} (\hat{y_i}-y_i)\boldsymbol w \boldsymbol x_i$$
|
|
|
+$$L(\boldsymbol w)=\sum\_{\boldsymbol x\_i \in M} (\hat{y\_i}-y\_i)\boldsymbol w \boldsymbol x\_i$$
|
|
|
损失函数的梯度为:
|
|
|
-$$\nabla_w L(\boldsymbol w)=\sum_{\boldsymbol x_i \in M} (\hat{y_i}-y_i)\boldsymbol x_i$$
|
|
|
-随机选取一个误分类点$(\boldsymbol x_i,y_i)$,对$\boldsymbol w$进行更新:
|
|
|
-$$\boldsymbol w \leftarrow \boldsymbol w-\eta(\hat{y_i}-y_i)\boldsymbol x_i=\boldsymbol w+\eta(y_i-\hat{y_i})\boldsymbol x_i$$
|
|
|
-显然式5.2为$\boldsymbol w$的第$i$个分量$w_i$的变化情况
|
|
|
+$$\nabla\_w L(\boldsymbol w)=\sum\_{\boldsymbol x\_i \in M} (\hat{y\_i}-y\_i)\boldsymbol x\_i$$
|
|
|
+随机选取一个误分类点$(\boldsymbol x\_i,y\_i)$,对$\boldsymbol w$进行更新:
|
|
|
+$$\boldsymbol w \leftarrow \boldsymbol w-\eta(\hat{y\_i}-y\_i)\boldsymbol x\_i=\boldsymbol w+\eta(y\_i-\hat{y\_i})\boldsymbol x\_i$$
|
|
|
+显然式5.2为$\boldsymbol w$的第$i$个分量$w\_i$的变化情况
|
|
|
### 5.12
|
|
|
-$$\Delta \theta_j = -\eta g_j$$
|
|
|
+$$\Delta \theta\_j = -\eta g\_j$$
|
|
|
[推导]:因为
|
|
|
-$$\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}$$
|
|
|
+$$\Delta \theta\_j = -\eta \cfrac{\partial E\_k}{\partial \theta\_j}$$
|
|
|
又
|
|
|
$$
|
|
|
\begin{aligned}
|
|
|
-\cfrac{\partial E_k}{\partial \theta_j} &= \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot\cfrac{\partial \hat{y}_j^k}{\partial \theta_j} \\\\
|
|
|
-&= (\hat{y}_j^k-y_j^k) \cdot f’(\beta_j-\theta_j) \cdot (-1) \\\\
|
|
|
-&= -(\hat{y}_j^k-y_j^k)f’(\beta_j-\theta_j) \\\\
|
|
|
-&= g_j
|
|
|
+\cfrac{\partial E\_k}{\partial \theta\_j} &= \cfrac{\partial E\_k}{\partial \hat{y}\_j^k} \cdot\cfrac{\partial \hat{y}\_j^k}{\partial \theta\_j} \\\\
|
|
|
+&= (\hat{y}\_j^k-y\_j^k) \cdot f’(\beta\_j-\theta\_j) \cdot (-1) \\\\
|
|
|
+&= -(\hat{y}\_j^k-y\_j^k)f’(\beta\_j-\theta\_j) \\\\
|
|
|
+&= g\_j
|
|
|
\end{aligned}
|
|
|
$$
|
|
|
所以
|
|
|
-$$\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}=-\eta g_j$$
|
|
|
+$$\Delta \theta\_j = -\eta \cfrac{\partial E\_k}{\partial \theta\_j}=-\eta g\_j$$
|
|
|
### 5.13
|
|
|
-$$\Delta v_{ih} = \eta e_h x_i$$
|
|
|
+$$\Delta v\_{ih} = \eta e\_h x\_i$$
|
|
|
[推导]:因为
|
|
|
-$$\Delta v_{ih} = -\eta \cfrac{\partial E_k}{\partial v_{ih}}$$
|
|
|
+$$\Delta v\_{ih} = -\eta \cfrac{\partial E\_k}{\partial v\_{ih}}$$
|
|
|
又
|
|
|
$$
|
|
|
\begin{aligned}
|
|
|
-\cfrac{\partial E_k}{\partial v_{ih}} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot \cfrac{\partial \alpha_h}{\partial v_{ih}} \\\\
|
|
|
-&= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot x_i \\\\
|
|
|
-&= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\\\
|
|
|
-&= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\\\
|
|
|
-&= \sum_{j=1}^{l} (-g_j) \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h) \cdot x_i \\\\
|
|
|
-&= -f’(\alpha_h-\gamma_h) \cdot \sum_{j=1}^{l} g_j \cdot w_{hj} \cdot x_i\\\\
|
|
|
-&= -b_h(1-b_h) \cdot \sum_{j=1}^{l} g_j \cdot w_{hj} \cdot x_i \\\\
|
|
|
-&= -e_h \cdot x_i
|
|
|
+\cfrac{\partial E\_k}{\partial v\_{ih}} &= \sum\_{j=1}^{l} \cfrac{\partial E\_k}{\partial \hat{y}\_j^k} \cdot \cfrac{\partial \hat{y}\_j^k}{\partial \beta\_j} \cdot \cfrac{\partial \beta\_j}{\partial b\_h} \cdot \cfrac{\partial b\_h}{\partial \alpha\_h} \cdot \cfrac{\partial \alpha\_h}{\partial v\_{ih}} \\\\
|
|
|
+&= \sum\_{j=1}^{l} \cfrac{\partial E\_k}{\partial \hat{y}\_j^k} \cdot \cfrac{\partial \hat{y}\_j^k}{\partial \beta\_j} \cdot \cfrac{\partial \beta\_j}{\partial b\_h} \cdot \cfrac{\partial b\_h}{\partial \alpha\_h} \cdot x\_i \\\\
|
|
|
+&= \sum\_{j=1}^{l} \cfrac{\partial E\_k}{\partial \hat{y}\_j^k} \cdot \cfrac{\partial \hat{y}\_j^k}{\partial \beta\_j} \cdot \cfrac{\partial \beta\_j}{\partial b\_h} \cdot f’(\alpha\_h-\gamma\_h) \cdot x\_i \\\\
|
|
|
+&= \sum\_{j=1}^{l} \cfrac{\partial E\_k}{\partial \hat{y}\_j^k} \cdot \cfrac{\partial \hat{y}\_j^k}{\partial \beta\_j} \cdot w\_{hj} \cdot f’(\alpha\_h-\gamma\_h) \cdot x\_i \\\\
|
|
|
+&= \sum\_{j=1}^{l} (-g\_j) \cdot w\_{hj} \cdot f’(\alpha\_h-\gamma\_h) \cdot x\_i \\\\
|
|
|
+&= -f’(\alpha\_h-\gamma\_h) \cdot \sum\_{j=1}^{l} g\_j \cdot w\_{hj} \cdot x\_i\\\\
|
|
|
+&= -b\_h(1-b\_h) \cdot \sum\_{j=1}^{l} g\_j \cdot w\_{hj} \cdot x\_i \\\\
|
|
|
+&= -e\_h \cdot x\_i
|
|
|
\end{aligned}
|
|
|
$$
|
|
|
所以
|
|
|
-$$\Delta v_{ih} = -\eta \cdot -e_h \cdot x_i=\eta e_h x_i$$
|
|
|
+$$\Delta v\_{ih} = -\eta \cdot -e\_h \cdot x\_i=\eta e\_h x\_i$$
|
|
|
### 5.14
|
|
|
-$$\Delta \gamma_h= -\eta e_h$$
|
|
|
+$$\Delta \gamma\_h= -\eta e\_h$$
|
|
|
[推导]:因为
|
|
|
-$$\Delta \gamma_h = -\eta \cfrac{\partial E_k}{\partial \gamma_h}$$
|
|
|
+$$\Delta \gamma\_h = -\eta \cfrac{\partial E\_k}{\partial \gamma\_h}$$
|
|
|
又
|
|
|
$$
|
|
|
\begin{aligned}
|
|
|
-\cfrac{\partial E_k}{\partial \gamma_h} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \gamma_h} \\\\
|
|
|
-&= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot f’(\alpha_h-\gamma_h) \cdot (-1) \\\\
|
|
|
-&= -\sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot w_{hj} \cdot f’(\alpha_h-\gamma_h)\\\\
|
|
|
-&=e_h
|
|
|
+\cfrac{\partial E\_k}{\partial \gamma\_h} &= \sum\_{j=1}^{l} \cfrac{\partial E\_k}{\partial \hat{y}\_j^k} \cdot \cfrac{\partial \hat{y}\_j^k}{\partial \beta\_j} \cdot \cfrac{\partial \beta\_j}{\partial b\_h} \cdot \cfrac{\partial b\_h}{\partial \gamma\_h} \\\\
|
|
|
+&= \sum\_{j=1}^{l} \cfrac{\partial E\_k}{\partial \hat{y}\_j^k} \cdot \cfrac{\partial \hat{y}\_j^k}{\partial \beta\_j} \cdot \cfrac{\partial \beta\_j}{\partial b\_h} \cdot f’(\alpha\_h-\gamma\_h) \cdot (-1) \\\\
|
|
|
+&= -\sum\_{j=1}^{l} \cfrac{\partial E\_k}{\partial \hat{y}\_j^k} \cdot \cfrac{\partial \hat{y}\_j^k}{\partial \beta\_j} \cdot w\_{hj} \cdot f’(\alpha\_h-\gamma\_h)\\\\
|
|
|
+&=e\_h
|
|
|
\end{aligned}
|
|
|
$$
|
|
|
所以
|
|
|
-$$\Delta \gamma_h= -\eta e_h$$
|
|
|
+$$\Delta \gamma\_h= -\eta e\_h$$
|