ResNet:残差网络

什么是ResNet

ResNet,即残差网络(Residual Network),是一种深度学习架构,它通过引入残差学习框架解决了随着网络层数增加而训练难度增大的问题。ResNet由Kaiming He等人在2015年提出,并在多个视觉识别任务中取得了突破性的成绩。

Resnet在cnn图像方面有着非常突出的表现,它利用 shortcut 短路连接,解决了深度网络中模型退化的问题。

网络中的亮点:

  • 超深的网络结构(突破1000层)

  • 提出residual(残差结构)模块

  • 使用Batch Normalization加速训练(丢弃dropout)

采用残差结构的原因

在ResNet提出之前,所有的神经网络都是通过卷积层和池化层的叠加组成的。然而实验中发现并不是一味的堆叠卷积层和池化层,网络的识别效果就会有显著提升,此时会出现两种问题:

梯度消失或爆炸的问题

梯度消失:若每一层的误差梯度小于1,反向传播时,网络越深,梯度越趋近于0 梯度爆炸:若每一层的误差梯度大于1,反向传播时,网络越深,梯度越来越大

退化问题

随着层数的增加,预测效果反而越来越差.

解决方法

  • 梯度消失或梯度爆炸问题在数据的预处理以及在网络中使用 BN(Batch Normalization)层来解决。
  • 深层网络中的退化问题通过残差网络 (ResNets)来减轻。ResNet论文提出了 residual结构(残差结构)来减轻退化问题,下图是使用residual结构的卷积网络,可以看到随着网络的不断加深,效果并没有变差,而是变的更好了。(虚线是train error,实线是test error)

Batch Normalization

Batch Normalization的目的是使我们的一批(Batch)feature map满足均值为0,方差为1的分布规律。

我们在图像预处理过程中通常会对图像进行标准化处理,这样能够加速网络的收敛,如下图所示,对于Conv1来说输入的就是满足某一分布的特征矩阵,但对于Conv2而言输入的feature map就不一定满足某一分布规律了(注意这里所说满足某一分布规律并不是指某一个feature map的数据要满足分布规律,理论上是指整个训练样本集所对应feature map的数据要满足分布规律)。而我们Batch Normalization的目的就是使我们的feature map满足均值为0,方差为1的分布规律。

ResNet的网络结构

ResNet block有两种,一种左侧两层的 BasicBlock 结构,一种是右侧三层的 Bottleneck 结构,即将两个33的卷积层替换为1 * 1+3 * 3 + 1 * 1,它通过1 * 1 conv来巧妙地缩减或扩张feature map维度,从而使得我们的3 * 3 conv的filters数目不受上一层输入的影响,它的输出也不会影响到下一层。

  • 左图参数:3 * 3 * 256 * 256 + 3 * 3 * 256 * 256 = 1179648
  • 右图参数:1 * 1 * 256 * 64 + 3 * 3 * 64 * 64 + 1 * 1 * 64 * 256 = 69,632
  • 参数大大减小,搭建深层次网络时,采用三层的残差结构(bottleneck)。
  • 先降后升为了主分支上输出的特征矩阵和捷径分支上输出的特征矩阵形状相同,以便进行加法操作。

分为实线和虚线分支的原因是: 在conv3_xconv4_xconv5_x网络中的第一个残差块是虚线的,其余的是实线的,为了起到承上启下的作用,虚线块需要确保特征矩阵等规模 主要有两个区别: * 主分支步距 + 1 * 子分支增加一个1 * 1的卷积核

BasicBlock 结构

  • 右图输入: [56,56,64]
  • 实线: [56,56,64] -> [28,28,128] -> [28,28,128]
  • 虚线: [28 , 28 , 128]
  • 输出: [28 , 28 , 128]

bottleneck 结构

  • 右图输入: [56,56,256]
  • 实线: [56,56,256] -> [56,56,128] -> [28,28,128] -> [28,28,512]
  • 虚线: [28 , 28 , 512]
  • 输出: [28 , 28 , 512]

ResNet模型构建

BasicBlock

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class BasicBlock(nn.Module):
expansion = 1

def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channel)
self.relu = nn.ReLU()
self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channel)
self.downsample = downsample

def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)

out += identity
out = self.relu(out)

return out

Bottleneck

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
class Bottleneck(nn.Module):
"""
注意:原论文中,在虚线残差结构的主分支上,第一个1x1卷积层的步距是2,第二个3x3卷积层步距是1。
但在pytorch官方实现过程中是第一个1x1卷积层的步距是1,第二个3x3卷积层步距是2,
这么做的好处是能够在top1上提升大概0.5%的准确率。
可参考Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
"""
expansion = 4

def __init__(self, in_channel, out_channel, stride=1, downsample=None,
groups=1, width_per_group=64):
super(Bottleneck, self).__init__()

width = int(out_channel * (width_per_group / 64.)) * groups

self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
kernel_size=1, stride=1, bias=False) # squeeze channels
self.bn1 = nn.BatchNorm2d(width)
# -----------------------------------------
self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,
kernel_size=3, stride=stride, bias=False, padding=1)
self.bn2 = nn.BatchNorm2d(width)
# -----------------------------------------
self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,
kernel_size=1, stride=1, bias=False) # unsqueeze channels
self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample

def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)

out = self.conv3(out)
out = self.bn3(out)

out += identity
out = self.relu(out)

return out

ResNet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
class ResNet(nn.Module):

def __init__(self,
block,
blocks_num,
num_classes=1000,
include_top=True,
groups=1,
width_per_group=64):
super(ResNet, self).__init__()
self.include_top = include_top
self.in_channel = 64

self.groups = groups
self.width_per_group = width_per_group

self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.in_channel)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, blocks_num[0])
self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
if self.include_top:
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1)
self.fc = nn.Linear(512 * block.expansion, num_classes)

for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

def _make_layer(self, block, channel, block_num, stride=1):
downsample = None
if stride != 1 or self.in_channel != channel * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(channel * block.expansion))

layers = []
layers.append(block(self.in_channel,
channel,
downsample=downsample,
stride=stride,
groups=self.groups,
width_per_group=self.width_per_group))
self.in_channel = channel * block.expansion

for _ in range(1, block_num):
layers.append(block(self.in_channel,
channel,
groups=self.groups,
width_per_group=self.width_per_group))

return nn.Sequential(*layers)

def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)

x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)

if self.include_top:
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)

return x

ResNet:残差网络
https://fu-jingqi.github.io/2024/08/01/ResNet:残差网络/
作者
coderfjq
发布于
2024年8月1日
许可协议