I used this kernel in the Kannada MNIST Competition, getting a final Private Score of 0.99040 and final Public Score of 0.98920, approximately 60th out of 1213 ( Top 5% ) on LB. Here is how this kernel implemented.

The CNN architecture is based on kernel of FWiktor. Thanks a lot to him.

CNN Architecture

First of all, here is the architecture of FWiktor’s network:

Based on this summary, I implemented a network and reached an accuracy of 85% on val_set( Dig-MNIST.csv ). Well, the result is fairly good, but I want a even higher accuracy, something like 95% or even 99%.

In order to achieve that, I adjusted some layers of the neural network structure and added some layers as well( Will mention it later ). Here is the summary of my network:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 28, 28]             640
       BatchNorm2d-2           [-1, 64, 28, 28]             128
         LeakyReLU-3           [-1, 64, 28, 28]               0
            Conv2d-4           [-1, 64, 28, 28]          36,928
       BatchNorm2d-5           [-1, 64, 28, 28]             128
         LeakyReLU-6           [-1, 64, 28, 28]               0
            Conv2d-7           [-1, 64, 28, 28]          36,928
       BatchNorm2d-8           [-1, 64, 28, 28]             128
         LeakyReLU-9           [-1, 64, 28, 28]               0
        MaxPool2d-10           [-1, 64, 14, 14]               0
        Dropout2d-11           [-1, 64, 14, 14]               0
           Conv2d-12          [-1, 128, 14, 14]          73,856
      BatchNorm2d-13          [-1, 128, 14, 14]             256
        LeakyReLU-14          [-1, 128, 14, 14]               0
           Conv2d-15          [-1, 128, 14, 14]         147,584
      BatchNorm2d-16          [-1, 128, 14, 14]             256
        LeakyReLU-17          [-1, 128, 14, 14]               0
           Conv2d-18          [-1, 128, 14, 14]         147,584
      BatchNorm2d-19          [-1, 128, 14, 14]             256
        LeakyReLU-20          [-1, 128, 14, 14]               0
        MaxPool2d-21            [-1, 128, 7, 7]               0
        Dropout2d-22            [-1, 128, 7, 7]               0
           Conv2d-23            [-1, 256, 7, 7]         295,168
      BatchNorm2d-24            [-1, 256, 7, 7]             512
        LeakyReLU-25            [-1, 256, 7, 7]               0
           Conv2d-26            [-1, 256, 7, 7]         590,080
      BatchNorm2d-27            [-1, 256, 7, 7]             512
        LeakyReLU-28            [-1, 256, 7, 7]               0
    GlobalAvgPool-29                  [-1, 256]               0
           Linear-30                   [-1, 32]           8,224
             ReLU-31                   [-1, 32]               0
           Linear-32                  [-1, 256]           8,448
          Sigmoid-33                  [-1, 256]               0
      Sq_Ex_Block-34            [-1, 256, 7, 7]               0
        MaxPool2d-35            [-1, 256, 3, 3]               0
        Dropout2d-36            [-1, 256, 3, 3]               0
           Linear-37                  [-1, 256]         590,080
        LeakyReLU-38                  [-1, 256]               0
      BatchNorm1d-39                  [-1, 256]             512
           Linear-40                   [-1, 10]           2,570
================================================================
Total params: 1,940,778
Trainable params: 1,940,778
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 6.17
Params size (MB): 7.40
Estimated Total Size (MB): 13.58
----------------------------------------------------------------

Differences Between FWiktor’s And Mine

Two More Conv2d Layers

When I trained my model on the FWikor’s network, I found the accuracy of val_set drastically improved from 10% to 80% in a very short time, like in 5 epochs, then it continued to improve a little bit to 85% in the approximately next 40 epochs and finally remained the same no matter how much longer you trained it. I believe it is because the network is not deep enough and therefore I want a deeper network than FWictor’s. Considering the fact that MNIST dataset is somewhat straightforward and doesn’t really need more Conv2d layers to detect its high-dimensional features, I put my additional Conv2d layers to where before the first Maxpool2d layer and the second Maxpool2d layer.

Modify Layer Parameters

In the original network, the parameter for Dropout2d() is 0.5 so that in each forward call, each channel has the same probability to be zeroed out or not, which stands for a greater randomness, which is definitely good. However, in reality, there is actually a very low probability for front layers to be zeroed out in a considerably deep neural network, so I modify the parameter to 0.4 and it turns out good.

Add A Squeeze-and-Excitation Network (SE Net)

Several References About SE Net:

https://arxiv.org/abs/1709.01507

https://towardsdatascience.com/review-senet-squeeze-and-excitation-network-winner-of-ilsvrc-2017-image-classification-a887b98b2883

https://medium.com/@konpat/squeeze-and-excitation-networks-hu-et-al-2017-48e691d3fe5e

https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/65939 ( A simple SE Block implementation )

Squeeze-and-Excitation (SE) Block helps dynamically “excite” feature maps that help classification and suppress features maps that don’t help based on the patterns of global averages of feature maps.

Implementation Of The Network

Implemented by PyTorch

Import Packages

1
2
3
4
5
6
7
8
9
10
11
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

import torchvision
from torchvision import transforms, datasets

from PIL import Image
import matplotlib.pyplot as plt

Implementing SE Block

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Sq_Ex_Block(nn.Module):
    def __init__(self, in_ch, r):
        super(Sq_Ex_Block, self).__init__()
        self.se = nn.Sequential(
            GlobalAvgPool(),
            nn.Linear(in_ch, in_ch // r),
            nn.ReLU(inplace=True),
            nn.Linear(in_ch // r, in_ch),
            nn.Sigmoid()
        )

    def forward(self, x):
        se_weight = self.se(x).unsqueeze(-1).unsqueeze(-1)
        x = x.mul(se_weight)
        return x


class GlobalAvgPool(nn.Module):
    def __init__(self):
        super(GlobalAvgPool, self).__init__()

    def forward(self, x):
        return x.view(*(x.shape[:-2]), -1).mean(-1)

Implementing Main Network

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
class KannadaNet(nn.Module):
    def __init__(self):
        super(KannadaNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 64, 3, stride=1, padding=1),  # 28 x 28 x 1 => 28 x 28 x 64
            nn.BatchNorm2d(64, 1e-3, 1e-2),
            nn.LeakyReLU(0.1, True)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(64, 64, 3, stride=1, padding=1),  # 28 x 28 x 64 => 28 x 28 x 64
            nn.BatchNorm2d(64, 1e-3, 1e-2),
            nn.LeakyReLU(0.1, True)
        )
        self.layer2_1 = nn.Sequential(
            nn.Conv2d(64, 64, 3, stride=1, padding=1),  # 28 x 28 x 64 => 28 x 28 x 64
            nn.BatchNorm2d(64, 1e-3, 1e-2),
            nn.LeakyReLU(0.1, True)
        )
        self.layer3 = nn.Sequential(
            nn.MaxPool2d(2, stride=2),  # 28 x 28 x 64 => 14 x 14 x 64
            nn.Dropout2d(0.4)
        )
        self.layer4 = nn.Sequential(
            nn.Conv2d(64, 128, 3, stride=1, padding=1),  # 14 x 14 x 64 => 14 x 14 x 128
            nn.BatchNorm2d(128, 1e-3, 1e-2),
            nn.LeakyReLU(0.1, True)
        )
        self.layer5 = nn.Sequential(
            nn.Conv2d(128, 128, 3, stride=1, padding=1),  # 14 x 14 x 128 => 14 x 14 x 128
            nn.BatchNorm2d(128, 1e-3, 1e-2),
            nn.LeakyReLU(0.1, True)
        )
        self.layer5_1 = nn.Sequential(
            nn.Conv2d(128, 128, 3, stride=1, padding=1),  # 14 x 14 x 128 => 14 x 14 x 128
            nn.BatchNorm2d(128, 1e-3, 1e-2),
            nn.LeakyReLU(0.1, True)
        )
        self.layer6 = nn.Sequential(
            nn.MaxPool2d(2, stride=2),  # 14 x 14 x 128 => 7 x 7 x 128
            nn.Dropout2d(0.4)
        )
        self.layer7 = nn.Sequential(
            nn.Conv2d(128, 256, 3, stride=1, padding=1),  # 7 x 7 x 128 => 7 x 7 x 256
            nn.BatchNorm2d(256, 1e-3, 1e-2),
            nn.LeakyReLU(0.1, True)
        )
        self.layer8 = nn.Sequential(
            nn.Conv2d(256, 256, 3, stride=1, padding=1),  # 7 x 7 x 256 => 7 x 7 x 256
            nn.BatchNorm2d(256, 1e-3, 1e-2),
            nn.LeakyReLU(0.1, True)
        )
        self.layer9 = nn.Sequential(
            Sq_Ex_Block(in_ch=256, r=8),
            nn.MaxPool2d(2, stride=2),  # 7 x 7 x 256 => 3 x 3 x 256
            nn.Dropout2d(0.4)
        )
        self.dense = nn.Sequential(
            nn.Linear(2304, 256),
            nn.LeakyReLU(0.1, True),
            nn.BatchNorm1d(256, 1e-3, 1e-2)
        )
        self.fc = nn.Linear(256, 10)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer2_1(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        x = self.layer5_1(x)
        x = self.layer6(x)
        x = self.layer7(x)
        x = self.layer8(x)
        x = self.layer9(x)
        x = x.view(-1, 3 * 3 * 256)
        x = self.dense(x)
        x = self.fc(x)
        return x

Some Other Tips On Kannaba MNIST

Data Augmentation

Data augmentation was performed with these parameters:

1
2
3
4
train_transform = transforms.Compose([
    transforms.RandomAffine(10, (0.25, 0.25), (0.8, 1.2), 5),
    transforms.ToTensor()
])

I Use RMSProp Optimizer

1
optimizer = torch.optim.RMSprop(kannada_net.parameters(), lr=1e-3, alpha=0.9)

I Use ReduceLROnPlateau Scheduler

1
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.5, verbose=True)

Save The Model That Has The Best Accuracy

1
2
3
4
5
6
7
8
9
10
11
12
max_acc = 0
best_model_dict = None

train()
val()

if acc > max_acc:
    max_acc = acc
    best_model_dict = kannada_net.state_dict()

# Predicting
kannada_net.load_state_dict(best_model_dict)

Automatically Quit Training To Save Time

1
2
3
if optimizer.param_groups[0]['lr'] < 5e-5:
    print("Learning Rate is Smaller than 0.00005, Stoping Trainning")
    break

Source Code

Thanks for reading and please leave an UPVODE if you find it useful.



发现存在错别字或者事实错误?请麻烦您点击 这里 汇报。谢谢您!