VGG:使用块的卷积神经网络

VGG 简介

VGG(Visual Geometry Group)网络是一种卷积神经网络模型,由牛津大学的视觉几何组和谷歌DeepMind共同提出,它在2014年的ImageNet挑战赛中取得了优异的成绩。VGG网络以其简单而有效的结构而著称,其核心思想是通过堆叠多个小尺寸的卷积核(如3x3)来构建深层网络,从而减少模型的参数数量,同时保持了网络的深度和性能。

  • conv的stride为1,padding为1
  • maxpool的size为2,stride为2

VGG16

VGG16包含13个卷积层和3个全连接层,因此得名“VGG16”。这些卷积层和全连接层都具有权重系数,而池化层不涉及权重,因此不计入权重层的总数。

VGG 的优势

  • 通过堆叠多个3x3的卷积核来替代大尺度卷积核(减少所需参数)

论文中提到,可以通过堆叠两个3x3的卷积核替代5x5的卷积核,堆叠三个3x3的卷积核替代7x7的卷积核。(拥有相同的感受野)

感受野

在卷积神经网络中,决定某一层输出结果中一个元素所对应的输入层的区域大小,被称作感受野(receptive field)。通俗的解释是,输出feature map.上的一个单元对应输入层上的区域大小。

论文中提到,可以通过堆叠两个3x3的卷积核替代5x5的卷积核,堆叠三个3x3的卷积核替代7x7的卷积核。使用7x7卷积核所需参数,与堆叠三个3x3卷积核所需参数(假设输入输出channel为C) * 7×7×C×C=49C^2 * 3×3×C×C+3×3×C×C+3×3×C×C=27C^2

VGG模型构建

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# !/usr/bin/env python3
# -*- coding: utf-8 -*-
# ********************************************************************************************************************
# Created: 2024/07/30
# Filename: GoogLeNet_model.py
# Email: 72110902110jq@gmail.com
# Create By: coderfjq
# LastModify: 2024/07/30
# ********************************************************************************************************************
# This code sucks, you know it and I know it.
# Move on and call me an idiot later.
import torch.nn as nn
import torch

# official pretrain weights
model_urls = {
'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth'
}


class VGG(nn.Module):
def __init__(self, features, num_classes=1000, init_weights=False):
super(VGG, self).__init__()
self.features = features
self.classifier = nn.Sequential(
nn.Linear(512*7*7, 4096),
nn.ReLU(True),
nn.Dropout(p=0.5),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(p=0.5),
nn.Linear(4096, num_classes)
)
if init_weights:
self._initialize_weights()

def forward(self, x):
# N x 3 x 224 x 224
x = self.features(x)
# N x 512 x 7 x 7
x = torch.flatten(x, start_dim=1)
# N x 512*7*7
x = self.classifier(x)
return x

def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
# nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.xavier_uniform_(m.weight)
# nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)


def make_features(cfg: list):
layers = []
in_channels = 3
for v in cfg:
if v == "M":
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
layers += [conv2d, nn.ReLU(True)]
in_channels = v
return nn.Sequential(*layers)


cfgs = {
'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}


def vgg(model_name="vgg16", **kwargs):
assert model_name in cfgs, "Warning: model number {} not in cfgs dict!".format(model_name)
cfg = cfgs[model_name]

model = VGG(make_features(cfg), **kwargs)
return model

VGG:使用块的卷积神经网络
https://fu-jingqi.github.io/2024/07/25/VGG:使用块的卷积神经网络/
作者
coderfjq
发布于
2024年7月25日
许可协议