本文严重参考了 CSDN 反向传播算法(过程及公式推导)

基本定义

在上图所示的简单神经网络中,layer 1 是输入层,layer 2 是隐藏层,layer 3是输出层。我们用上图来阐述一些变量名称的意义:

名称 含义
b_{i}^{l} l 层的第 i 个神经元的偏置
w_{ji}^{l} l-1 层的第 i 个神经元连接第 l 层的第 j 个神经元
z_{i}^{l} l 层的第 i 个神经元的输入
a_{i}^{l} l 层的第 i 个神经元的输出
\sigma 激活函数

通过上面的定义,我们可以知道:

z_{j}^{l} = \sum_{i}w_{ji}^{l}a_{i}^{l-1} + b_{j}^{l} a_{j}^{l} = \sigma z_{j}^{l} = \sigma \left( \sum_{i}w_{ji}^{l}a_{i}^{l-1} + b_{j}^{l} \right)

我们令损失函数为二次代价函数 (Quadratic Cost Function) :

J = \frac{1}{2n} \sum_{x} \lvert \lvert y(x) - a^{L}(x) \rvert \rvert ^ {2}

其中, x 表示输入样本, y(x) 表示实际分类, a^{L}(x) 表示预测分类, L 表示网络的最大层数。当只有一个输入样本时,损失函数 J 标示为:

J = \frac{1}{2} \sum_{x} \lvert \lvert y(x) - a^{L}(x) \rvert \rvert ^ {2}

最后我们将第 l 层第 i 个神经元中产生的错误定义为:

\delta_{i}^{l} \equiv \frac{\partial{J}}{\partial{z_{i}^{l}}}

公式推导

损失函数对最后一层神经网络产生的错误为:

% <![CDATA[ \begin{aligned}\delta_{i}^{L} &= \frac{\partial{J}}{\partial{z_{i}^{L}}}\\&=\frac{\partial{J}}{\partial{a_{i}^{L}}} \cdot \frac{\partial{a_{i}^{L}}}{\partial{z_{i}^{L}}}\\&=\nabla J(a_{i}^{L}) \sigma^{'}(z_{i}^{L})\end{aligned} %]]> \delta^{L} = \nabla J(a^{L}) \odot \sigma^{'}(z^{L})

损失函数对第 j 层网络产生的错误为:

% <![CDATA[ \begin{aligned}\delta_{j}^{l} &= \frac{\partial{J}}{\partial{z_{j}^{l}}} \\ &= \frac{\partial{J}}{\partial{a_{j}^{l}}} \cdot \frac{\partial{a_{j}^{l}}}{\partial{z_{j}^{l}}} \\ &= \sum_{i} \frac{\partial{J}}{\partial{z_{i}^{l+1}}} \cdot \frac{\partial{z_{i}^{l+1}}}{\partial{a_{j}^{l}}} \cdot \frac{\partial{a_{j}^{l}}}{\partial{z_{j}^{l}}} \\ &= \sum_{i} \delta_{i}^{l+1} \cdot \frac{\partial{w_{ij}^{l+1}a_{j}^{l} + b_{i}^{l+1}}}{\partial{a_{j}^{l}}} \cdot \sigma^{'}(z_{j}^{l}) \\ &=\sum_{i} \delta_{i}^{l+1} \cdot w_{ij}^{l+1} \cdot \sigma^{'}(z_{j}^{l}) \end{aligned} %]]> \delta^{l} = \left( \left( w^{l+1} \right)^{T} \delta^{l+1} \right) \odot \sigma^{'}(z^{l})

则通过损失函数我们可以计算权重的梯度为:

% <![CDATA[ \begin{aligned} \frac{\partial{J}}{\partial{w_{ji}^{l}}} &= \frac{\partial{J}}{\partial{z_{j}^{l}}} \cdot \frac{\partial{z_{j}^{l}}}{\partial{w_{ji}^{l}}} \\ &= \delta_{j}^{l} \cdot \frac{\partial{\left( w_{ji}^{l}a_{i}^{l-1} + b_{j}^{l} \right)}}{\partial{w_{ji}^{l}}} \\ &= \delta_{j}^{l} \cdot a_{i}^{l-1} \end{aligned} %]]> \frac{\partial{J}}{\partial{w_{ji}^{l}}} = \delta_{j}^{l} \cdot a_{i}^{l-1}

最后,通过损失函数计算偏执的梯度为:

% <![CDATA[ \begin{aligned} \frac{\partial{J}}{\partial{b_{j}^{l}}} &= \frac{\partial{J}}{\partial{z_{j}^{l}}} \cdot \frac{\partial{z_{j}^{l}}}{\partial{b_{j}^{l}}} \\ &= \delta_{j}^{l} \cdot \frac{\partial{w_{ji}^{l} a_{i}^{l-1} + b_{j}^{l}}}{\partial{b_{j}^{l}}} \\ &=\delta_{j}^{l} \end{aligned} %]]>

发现存在错别字或者事实错误?请麻烦您点击 这里 汇报。谢谢您!