PyTorch深度学习（25）网络结构ConvNeXt

ConvNeXt

论文地址：https://arxiv.org/abs/2201.03545

一、改进点

随着技术的不断发展，各种新的架构及优化策略促使Transformer拥有更好的效果
相同策略训练卷积神经网络

以ResNet-50为基准

1、Macro design

（1）Swin-T的比例是1：1：3：1 Swin-L的比例是1：1：9：1
堆叠次数由(3, 4, 6, 3)调整为(3, 3, 9, 3)
（2）最初的下采样模块为stem，例如ResNet中stem是7×7卷积核3×3最大池化组成
将ResNet中stem换成卷积核为4，stride为4的卷积层（参考swim-transformer）

2、ResNetXt

（1）使用group convolution
depthwise convolution组卷积的group数和输入层的channel数相等
depthwise convolution——对于每个通道输入图像，对应卷积核进行操作

（2）增大特征层的channel，将每个stage的channel设置与swin-transformer的channel保持一致

3、Inverted bottleneck

ResNet提出bottleneck结构(两头粗，中间细)，MobileNetV2提出Inverted Bottleneck(两头细，中间粗)

4、Large Kerner size

（1）moving up depthwise conv layer，将depthwise conv模块上移
原：1×1 conv → depthwise conv → 1×1 conv
现：depthwise conv → 1×1 conv → 1×1 conv
原因：depthwise conv layer类似Multi-head attention
（2）Increasing the kernel size，将depthwise conv卷积核大小由3×3改成7×7 （7与Swin-Transformer的窗口大小一致）

5、Various layer-wise Micro designs

Replacing ReLU with GELU——准确率没有变化
Fewer activation functions ——Swin Transformer Block仅在1×1卷积后有GELU
Fewer normalization layers
Substituting BN with LN ——Transformer使用LN
Separate downsampling layers