Advanced Neural Net (2)

๊ณผ์ œ ๋‚ด์šฉ ์„ค๋ช…

  1. ์บ๊ธ€ Kannada MNIST๋ฅผ ์ด์šฉํ•œ ๋ฏธ๋‹ˆ๋Œ€ํšŒ

์šฐ์ˆ˜๊ณผ์ œ ์„ ์ • ์ด์œ 

์กฐ์ƒ์—ฐ๋‹˜์€ ์™„๋ฒฝํ•œ ๋…ธํŠธ๋ถ์ด์—ˆ์Šต๋‹ˆ๋‹ค. keras sklearn wrapper๋ฅผ ์ด์šฉํ•ด์„œ ๊ทธ๋ฆฌ๋“œ ์„œ์น˜ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์ง„ํ–‰ํ•˜์‹ ์ , hiplot์ด๋ผ๋Š” ๋ชจ๋“ˆ๋กœ ๊ฐ ๋ ˆ์ด์–ด๋ณ„ ์‹œ๊ฐํ™” ์ง„ํ–‰ํ•˜์‹œ๊ณ  ์ด๋ฅผ ํ†ตํ•ด ์„ฑ๋Šฅ์ด ๋†’์€ ๋ฐฐ์น˜์ •๊ทœํ™”, ๋“œ๋กญ์•„์›ƒ ๋“ฑ์„ ๋ถ„์„ํ•˜์‹ ์ , ๋˜ํ•œ SOPCNN์ด๋ผ๋Š” MNIST SOTA ๋ชจ๋ธ์„ ์ฐพ์•„ ๋ณด์‹ ์ , Autokeras ์‹คํ—˜๊นŒ์ง€ ๋ฐฐ์šธ๊ฒŒ ๋งŽ์€ ์ตœ๊ณ ์˜ ๋…ธํŠธ๋ถ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

7์ฃผ์ฐจ: Deep Learning Framework

13๊ธฐ ์กฐ์ƒ์—ฐ

๊ณผ์ œ: Kannada MNIST (https://www.kaggle.com/c/tobigs13nn)

๋ฐ์ดํ„ฐ: Train: 42,000 rows, Test: 18,000 rows

๋ชฉ์ฐจ

  1. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

  2. ์—ฌ๋Ÿฌ ๋”ฅ๋Ÿฌ๋‹ ์‹ฌํ™” ๊ธฐ๋ฒ• ๋น„๊ต (with Hiplot)

    • Activation

      • Relu / Leaky Relu / PRelu

      • Softmax

    • Batch Norm

    • Weight Init

    • Optimizer

      • Rmsprop, Adam, Radam

    • Regularzation

      • Dropout, Spatial Dropout

      • Early Stopping

      • Data Augmentation

  3. ํ•™์Šต ๋ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •

0. Pre-requisite & Module Import

In [ ]:

!pip install tensorflow-gpu==2.1.0 # Keras Data Augmentation ํˆด ์ด์šฉ์‹œ ํ•„์š”, ๊ทธ๋ƒฅ tf 2.0์œผ๋กœ ํ•˜๋ฉด ์—๋Ÿฌ
# !pip install autokeras
# !pip install hiplot
# !pip install kaggle

In [ ]:

from __future__ import absolute_import, division, print_function, unicode_literals

try:
    %tensorflow_version 2.x # Colab์—์„œ ํ…์„œํ”Œ๋กœ์šฐ 2.0์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” Magic Method
except Exception:
    pass
import tensorflow as tf

import os
import tensorflow_datasets as tfds
TensorFlow 2.x selected.

In [ ]:

%load_ext tensorboard

In [8]:

import numpy as np 
import pandas as pd 

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, Dropout, Dense, Flatten, BatchNormalization, MaxPooling2D, LeakyReLU, ReLU, PReLU
from tensorflow.keras.optimizers import RMSprop, Nadam, Adadelta, Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.regularizers import l2

In [ ]:

!ls ~/.kaggle/

In [ ]:

# Kaggle API Key to Root
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

In [ ]:

import os

def save_and_submit(model, filename, description):
    """
    CSVํŒŒ์ผ ์ €์žฅ ํ›„ Kaggle๋กœ ์ „์†ก
    """
    if "predict_classes" in dir(model):
        predict = model.predict_classes(X_test)
    else:
        # Predict_classes ํ•จ์ˆ˜๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ predict ํ›„ argmax ๋กœ predict ์ƒ์„ฑ
        predict = np.argmax(model.predict(X_test), axis=1)
    
    sample_submission['Category'] = predict
    sample_submission.to_csv(filename,index=False)
    
    # system ํ•จ์ˆ˜๋กœ kaggle ๋ช…๋ น์–ด ์‹คํ–‰
    os.system(f"kaggle competitions submit -c tobigs13nn -f {filename} -m {description}")

In [3]:

import glob
glob.glob("*")

Out[3]:

['w7_pytorch.ipynb',
 'test_df.csv',
 'Untitled.ipynb',
 'FirePytorch.ipynb',
 'w7_DL_Framework.ipynb',
 'FireKeras.ipynb',
 'tutorial-for-everybody.ipynb',
 'grid_result.json',
 'train_df.csv',
 'sample_submission.csv']

In [ ]:

# GPU ์…‹ํŒ… ํ™•์ธ
tf.test.gpu_device_name()

Out[ ]:

'/device:GPU:0'

1. Data Load & Preprocessing

In [ ]:

sample_submission = pd.read_csv("sample_submission.csv")
test = pd.read_csv("test_df.csv")
train = pd.read_csv("train_df.csv")

print(f"Train data shape {train.shape}")
print(f"Test data shape {test.shape}")

X = train.iloc[:,1:].values
y = train.iloc[:,0].values
X_test = test.iloc[:,1:].values


X = X / 255
X_test = X_test / 255

# ์ด๋ฏธ์ง€๋กœ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๋ณ€ํ™˜
X = X.reshape(-1, 28, 28,1)
X_test = X_test.reshape(-1, 28, 28,1)

y = to_categorical(y)
Train data shape (42000, 785)
Test data shape (18000, 785)

In [ ]:

from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state=42)

In [ ]:

# ๋ฐ์ดํ„ฐ ํฌ๊ธฐ ํ™•์ธ
x_train.shape, x_val.shape, X_test.shape

Out[ ]:

((33600, 28, 28, 1), (8400, 28, 28, 1), (18000, 28, 28, 1))

2. ๋”ฅ๋Ÿฌ๋‹ ์‹ฌํ™” ๊ธฐ๋ฒ• Grid Search (with Hiplot)

๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•๊ณผ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ํ†ตํ•ด ๊ฐ ๊ธฐ๋ฒ• ๋“ค์˜ ํšจ๊ณผ๋ฅผ ์ •๋ฆฌ

param_grid = {
    'batch_size': [512, 1024, 2048],
    'epochs': [20],
    '_optimizer': ['RMSprop','Adam'], 
    '_lr': [1e-3, 2e-3, 1e-2], 
    '_batch_norm': [1, 0], 
    '_activation': ['relu'],
    '_dropout': [0.2, 0.4], 
}

์ด๋ฒˆ์— ๋ฐฐ์šด ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ธฐ๋ฒ•๋“ค์˜ ์œ /๋ฌด,ํฌ๊ธฐ ์กฐ์ ˆ ๋“ฑ์„ ํ†ตํ•ด ๊ทธ ํšจ๊ณผ๋ฅผ ๋ถ„์„ํ•˜๊ณ  ๊ฐ€์žฅ ์ตœ์ ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ตฌํ•ด๋ณธ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ๋ณ„๋กœ ์ตœ์†Œ 3๊ฐœ์—์„œ 5๊ฐœ๊นŒ์ง€ ์˜ต์…˜์„ ์ฃผ๊ณ  ์‹ถ์—ˆ์ง€๋งŒ Colab ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ฐ๋‹นํ•˜์ง€ ๋ชปํ•˜๊ณ  ๊ณ„์† ํ„ฐ์ ธ์„œ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์œผ๋กœ ๋‚ฎ์ถ”์—ˆ๋‹ค.

In [ ]:

# ์ด์ „์— ๋ฐฐ์šด scikit-learn์˜ GridSearchCV์™€ KerasClassifier ์ด์šฉ
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

In [ ]:

# Function to create model, required for KerasClassifier
def create_model(_optimizer, _lr, _batch_norm, _activation, _dropout):
    """
    ๊ธฐ๋ฒ• ๊ฐ„ ๋น„๊ต๋ฅผ ์œ„ํ•œ ๋ชจ๋ธ ์ƒ์„ฑ (FC 3-Layer)
    """
    model = tf.keras.models.Sequential()
    model.add(Dense(64, input_shape=(784,)))
    model.add(BatchNormalization())  if _batch_norm else '' # batch_norm ์œ /๋ฌด ๊ฒฐ์ •
    model.add(Dropout(_dropout))

    model.add(Dense(128, activation=_activation))
    model.add(BatchNormalization())  if _batch_norm else '' # batch_norm ์œ /๋ฌด ๊ฒฐ์ •
    model.add(Dropout(_dropout))

    model.add(Dense(256, activation=_activation))
    model.add(BatchNormalization())  if _batch_norm else '' # batch_norm ์œ /๋ฌด ๊ฒฐ์ •
    model.add(Dropout(_dropout))

    model.add(Dense(10, activation='softmax'))

    optimizer = getattr(tf.keras.optimizers, _optimizer)(learning_rate=_lr)

    model.compile(loss='categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

    return model

# KerasClassifier๋กœ ๋ชจ๋ธ ์ƒ์„ฑ๊ธฐ ์„ค์ •
model = KerasClassifier(build_fn=create_model, verbose=3)

In [ ]:

# Grid ์„ค์ •
param_grid = {
    'batch_size': [1024, 2048],
    'epochs': [15, 30],
    '_optimizer': ['RMSprop','Adam'], 
    '_lr': [1e-3, 2e-3, 1e-2], 
    '_batch_norm': [1, 0], 
    '_activation': ['relu'],
    '_dropout': [0.2, 0.4], 
}

In [ ]:

param_grid

Out[ ]:

{'batch_size': [2048],
 'epochs': [15, 30],
 '_optimizer': ['RMSprop', 'Adam'],
 '_lr': [0.001, 0.01],
 '_batch_norm': [True, False],
 '_activation': ['relu'],
 '_dropout': [0.2, 0.4]}

In [ ]:

grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
grid_result = grid.fit(X, y) #Grid Search ํ•™์Šต
grid_result

Out[ ]:

GridSearchCV(cv=3, error_score=nan,
             estimator=<tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x7f61b1e6b6d8>,
             iid='deprecated', n_jobs=None,
             param_grid={'_activation': ['relu'], '_batch_norm': [True, False],
                         '_dropout': [0.2, 0.4], '_lr': [0.001, 0.01],
                         '_optimizer': ['RMSprop', 'Adam'],
                         'batch_size': [2048], 'epochs': [15, 30]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

In [ ]:

grid_result.cv_results_['rank_test_score'] # ํ…Œ์ŠคํŠธ Score ์ˆœ์œ„
grid_result.cv_results_['mean_test_score'] # ํ…Œ์ŠคํŠธ Score ํ‰๊ท 
grid_result.cv_results_['std_test_score'] # ํ…Œ์ŠคํŠธ Score std

Out[ ]:

array([0.00191339, 0.00231553, 0.00210279, 0.0021028 , 0.00749656,
       0.00113289, 0.00187718, 0.00275614, 0.0021762 , 0.00204734,
       0.00289577, 0.00242509, 0.0027911 , 0.00478707, 0.00220003,
       0.00271803, 0.00134518, 0.00207868, 0.00301714, 0.00154302,
       0.00933254, 0.00683188, 0.00201384, 0.0019293 , 0.00253702,
       0.00236972, 0.00268085, 0.00193342, 0.00889858, 0.01590848,
       0.0020548 , 0.00267833])

In [ ]:

# ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ ์ „์ฒ˜๋ฆฌ
grid_params = grid_result.cv_results_['params']
for i in range(len(grid_params)):
    for key in ['mean_test_score', 'std_test_score', 'mean_fit_time']:
        grid_params[i][key] = grid_result.cv_results_[key][i]

In [ ]:

import json

In [ ]:

# grid_params ๊ฒฐ๊ณผ ์ €์žฅ
with open('grid_result.json','w') as f:
    f.write(json.dumps(grid_params))

In [11]:

# grid_params ๊ฒฐ๊ณผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
with open('grid_result2.json','r') as f:
    grid_params = json.loads(f.read())

In [28]:

# grid_params

2.2 ๊ฒฐ๊ณผ ์‹œ๊ฐํ™”

์‹œ๊ฐํ™”๋กœ Facebook Research์˜ Hiplot( https://github.com/facebookresearch/hiplot )์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

ํŠนํžˆ ๊ณ ์ฐจ์›์˜ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ํŒจํ„ด์„ ๋ณด๊ธฐ์— ์šฉ์ดํ•˜๋‹ค๊ณ  ํ•œ๋‹ค.

In [12]:

import hiplot as hip
hip.Experiment.from_iterable(grid_params).display()
<IPython.core.display.Javascript object>

Out[12]:

<hiplot.ipython.IPythonExperimentDisplayed at 0x114de8bd0>

์ „์ฒด ๊ทธ๋ž˜ํ”„

mean_score, Std_score ์ƒ์œ„ ๋ชจ๋ธ ๊ทธ๋ž˜ํ”„

์ธ์‚ฌ์ดํŠธ

  1. ํ™•์‹คํžˆ ์ตœ์ƒ์œ„๊ถŒ์˜ ๋ชจ๋ธ์—” BatchNorm์ด ๊ฑฐ์˜ ๋ชจ๋‘ ์ ์šฉ๋˜์–ด ์žˆ๋‹ค.

  2. Dropout์€ 0.2์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.

  3. Lr์€ 0.01์ด ์ข‹๊ฒŒ ๋‚˜์™”์ง€๋งŒ, ์ด๋Š” Epoch์ด 20 ๋ฐ–์— ์•ˆ๋˜์–ด ์•„์ง ํ•™์Šต์ด ์ง„ํ–‰ ์ค‘์ผ ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๋‹ค.

  4. Batch Size๋Š” ๋‚ฎ์„ ์ˆ˜๋ก ์ข‹์€ ์„ฑ๋Šฅ์ด ๋‚˜์™”๋‹ค. ์™„์ „ ๋น„๋ก€๋ผ ๋ณด๊ธด ํž˜๋“ค๊ฒ ์ง€๋งŒ ๋…ผ๋ฌธ์—์„œ๋„ 256์„ ์ ์šฉํ•œ ๊ฒƒ์„ ๋ณด๋ฉด ๋„ˆ๋ฌด ํฌ๋ฉด ๊ฐ Batch์˜ ํŠน์„ฑ์ด ๋ญ‰๊ฐœ์ง€๋Š”๊ฒŒ ์•„๋‹๊นŒ ์ถ”์ธก๋œ๋‹ค.

  5. Optimizer๋Š” Adam๊ณผ RMSprop ๋ชจ๋‘ ๊ดœ์ฐฎ์€ ์„ฑ๋Šฅ์ด ๋‚˜์™”๋‹ค.

3. Model Research & Selection

MNIST ๋ฐ์ดํ„ฐ์…‹์€ ๋Œ€ํ‘œ์ ์ธ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋งŽ์€ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด SOTA๊ธ‰ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค.

๊ทธ์ค‘์—์„œ ํŠนํžˆ CNN ๋ชจ๋ธ์ด ๋น ๋ฅธ ํ•™์Šต ์†๋„์™€ ์›”๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์œผ๋ฉฐ ์ด๋ฒˆ ๊ณผ์ œ์˜ ์ „์‹ ์ธ Kannada MNIST์—์„œ๋„ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ํ•ด๋‹น ๋ชจ๋ธ์„ ํ†ตํ•ด ์ข‹์€ ์„ฑ์ ์„ ๊ฑฐ๋‘์—ˆ๋‹ค. ์ด๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๋ชจ๋ธ์„ ์„ค๊ณ„ํ•˜๊ณ , Papers with code ์—์„œ SOTA๊ธ‰ ๋…ผ๋ฌธ ๋“ค์„ ์ฐธ๊ณ ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๊ธฐ๋ฒ• ๋“ค์˜ ์žฅ๋‹จ์ ์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ ์šฉํ•ด๋ณธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ๋กœ ๊ฝค ์ข‹์€ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

3.1 SOPCNN

Stochastic Optimization of Plain Convolutional Neural Networks with Simple methods

2020 MNIST SOTA

MNIST 2020๋…„ SOTA์ธ SOPCNN ๋…ผ๋ฌธ์„ ๋ณด๋ฉด, ์ตœ์ ํ™” ๊ธฐ๋ฒ•์— ์ƒ๋‹นํžˆ ๊ณต์„ ๋“ค์˜€์Œ์„ ์•Œ ์ˆ˜ ์žˆ๊ณ  ํŠนํžˆ ์ด๋ฒˆ ๋‚ด์šฉ๊ณผ ๊ฒน์น˜๋Š” ๋ถ€๋ถ„์ด ๊ฐ€์ ธ์™€ ์ ์šฉํ•ด๋ณด๋ฉด ์ข‹์„ ์ ์ด ๋งŽ์•˜๋‹ค. ์ด ๋…ผ๋ฌธ์ด ์ง€ํ–ฅํ•˜๋Š” ๋ฐ”๋Š” CNN๋ชจ๋ธ์ด ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ epoch์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ฅธ overfitting ๋ฌธ์ œ๊ฐ€ ์‹ฌํ•˜์—ฌ ์ด๋ฅผ ์–ด๋–ป๊ฒŒ ์ž˜ ์ตœ์ ํ™”ํ• ์ง€๋กœ Data Augmentation๊ณผ ํŠนํžˆ Dropout์„ ์ฃผ๋กœ ๋‹ค๋ฃจ๊ณ  ์žˆ๋‹ค.

Architecture and Design

๊ธฐ๋ณธ์  ๋ชจ๋ธ ๊ตฌ์„ฑ์€ SimpleNet์˜ ๊ตฌ์„ฑ์„ ๋”ฐ๋ฅด๊ณ  ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. MNIST ๋ชจ๋ธ์„ ์˜ˆ๋ฅผ ๋“ค๋ฉด, ์ด 4๊ฐœ์˜ Conv2D layer ๊ฐ€ ์žˆ๊ณ  2๊ฐœ ๋งˆ๋‹ค Max Pooling Layer๊ฐ€ ๋ถ™๋Š”๋‹ค. ๋’ค์ด์–ด 2๊ฐœ์˜ Fully Connected Layer, ๋งˆ์ง€๋ง‰์—” Softmax Activation Layer๋ฅผ ๋ถ™์—ฌ ๋ชจ๋ธ์„ ์™„์„ฑ์‹œ์ผฐ๋‹ค. ์—ฌ๊ธฐ์„œ ํ•™์Šต๋ฅ ์€ 0.01๋กœ ์ฃผ์—ˆ๊ณ  Dropout ์œ„์น˜ ๋ฐ FC Layer์˜ ํฌ๊ธฐ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •์„ ํ†ตํ•ด ๊ฒฐ์ •ํ•˜์˜€๋‹ค.

๊ฐ€์žฅ ์ธ์ƒ๊นŠ์—ˆ๋˜ ์ 

  1. Dropout์€ Softmax ์ง์ „์— ํ•˜๋‚˜๋งŒ ์žˆ๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์ข‹๋‹ค :Maxpool ๋’ค์— ๋ฐฐ์น˜ํ•˜๊ธฐ๋„ ํ•˜๊ณ , Spatial Dropout๋„ ์ ์šฉํ•ด๋ณด์•˜์ง€๋งŒ ๊ทธ๋ƒฅ Regular Dropout์„ FC ๋‹ค์Œ์— ๋ฐฐ์น˜ํ•œ ๊ฒƒ์ด ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์•˜๋‹ค๊ณ  ํ•œ๋‹ค.

  2. FC Layer 2048, Drop rate 0.8์ด ๊ฐ€์žฅ ์ข‹๋‹ค: ์™œ ์ด ๋…ผ๋ฌธ ์ œ๋ชฉ์—์„œ Stochastic์ด๋ž€ ๋ง์„ ์ผ๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ๋Š” ๋Œ€๋ชฉ์ด๋‹ค. ๋ฏฟ๊ธฐ์ง€ ์•Š๋Š” ๋“œ๋กญ์œจ์ด๋ผ 5๋ฒˆ์˜ ๋ฐ˜๋ณต ์‹คํ—˜์„ ํ†ตํ•ด ํ‰๊ท  0.18%์˜ ์—๋Ÿฌ์œจ์ด ๋‚˜์˜จ๋‹ค๋Š” ๊ฒƒ์„ ์ž…์ฆํ•˜์˜€๋‹ค.

๊ทธ ์™ธ์—๋„ Data Augmentation ์…‹ํŒ… ๋“ฑ์„ ์†Œ๊ฐœํ•˜๊ณ  ์žˆ์–ด ํ•™์Šต์— ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

3.1.1 ๊ตฌํ˜„

In [9]:

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, (3,3), padding='same', input_shape=(28, 28, 1)),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),

    tf.keras.layers.Conv2D(64,  (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),

    tf.keras.layers.MaxPooling2D(2, 2),
    
    tf.keras.layers.Conv2D(128, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),

    tf.keras.layers.Conv2D(128, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(2048),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Dense(2048),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Dropout(0.8),
    tf.keras.layers.Dense(10, activation='softmax')
])
WARNING:tensorflow:Large dropout rate: 0.8 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.

In [10]:

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 28, 28, 64)        640       
_________________________________________________________________
batch_normalization (BatchNo (None, 28, 28, 64)        256       
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 28, 28, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 64)        36928     
_________________________________________________________________
batch_normalization_1 (Batch (None, 28, 28, 64)        256       
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 28, 28, 64)        0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 14, 14, 128)       73856     
_________________________________________________________________
batch_normalization_2 (Batch (None, 14, 14, 128)       512       
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 14, 14, 128)       147584    
_________________________________________________________________
batch_normalization_3 (Batch (None, 14, 14, 128)       512       
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 14, 14, 128)       0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 6272)              0         
_________________________________________________________________
dense (Dense)                (None, 2048)              12847104  
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU)    (None, 2048)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 2048)              4196352   
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU)    (None, 2048)              0         
_________________________________________________________________
dropout (Dropout)            (None, 2048)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                20490     
=================================================================
Total params: 17,324,490
Trainable params: 17,323,722
Non-trainable params: 768
_________________________________________________________________

ํ…์„œํ”Œ๋กœ์šฐ๊ฐ€ ์นœ์ ˆํ•˜๊ฒŒ Drop rate๊ฐ€ 0.5๊ฐ€ ๋„˜์œผ๋‹ˆ ํ˜น์‹œ ๊ณผ๊ฑฐ์—์„œ ์˜ค์‹  ๋ถ„์ธ์ง€ ์—ฌ์ญ™๊ณ  ์žˆ์ง€๋งŒ ๋ฌด์‹œํ•˜๊ณ  ํ•™์Šต์„ ์ง„ํ–‰ํ•ด๋ณธ๋‹ค.

In [ ]:

optimizer = Adam(learning_rate=0.01) # ๋…ผ๋ฌธ ์„ค์ •๋Œ€๋กœ 0.01์„ ์ฃผ์—ˆ๋‹ค.

model.compile(loss='categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

3.1.2 ํ•™์Šต ๋ฐ ๊ฒฐ๊ณผ

์‹ค์ œ Colab์—์„œ ๋Œ๋ ค๋ณธ ๊ฒฐ๊ณผ ํ•™์Šต์ด ๋„ˆ๋ฌด ์•ˆ๋˜์—ˆ๋‹ค. Val_Acc๊ฐ€ 0.98์— ์ ‘๊ทผ์กฐ์ฐจ ๋ชปํ•˜๊ณ  ๋ชจ๋ธ์ด ์ „ํ˜€ Simpleํ•˜์ง€ ์•Š์•„ ํ•™์Šต์—๋„ ์˜ค๋žœ์‹œ๊ฐ„์ด ๊ฑธ๋ ธ๋‹ค.

๊ทธ ์ดํ›„๋กœ FC์˜ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ 2048๋ฅผ 1024๋กœ, Drop Rate๋ฅผ 0.6, 0.4๋กœ ๊ฐ๊ธฐ ์‹คํ—˜ํ•ด๋ณด์•˜์„ ๋•Œ Val Acc 99.6๋กœ ๋น„์Šทํ•˜๊ฒŒ ๋†’์€ ์„ฑ๋Šฅ์ด ๋‚˜์™”๋‹ค.

์—ฌ๊ธฐ์„œ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋‚˜ ๋ถ„๋ฅ˜ํ•  ๊ฐฏ์ˆ˜๋กœ FC ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ข€ ๋” ์ž‘๊ฒŒ ์กฐ์ •ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ, ๋…ผ๋ฌธ์—์„  Epoch์„ 2000๊นŒ์ง€ ์ง„ํ–‰ํ•˜๋Š”๋ฐ ์ง€๊ธˆ Colab์—์„œ ๊ทธ๊ฑด ํž˜๋“ค๊ธฐ ๋•Œ๋ฌธ์— Drop rate์—์„œ ์–ด๋Š์ •๋„ ํƒ€ํ˜‘์„ ๋ณด์•„ ๋น ๋ฅธ ํ•™์Šต์„ ์ง„ํ–‰ํ•ด์•ผ ๊ฒ ๋‹ค๋Š” ๋ฐฉํ–ฅ์„ฑ์„ ์„ธ์šธ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

3.2 CNN (VGG + Data Augmentation)

https://www.kaggle.com/benanakca/kannada-mnist-cnn-tutorial-with-app-top-2

https://www.kaggle.com/c/Kannada-MNIST ์—์„œ ์ƒ์œ„ 2%์˜ ์„ฑ๋Šฅ์ด ๋‚˜์˜จ ๋ชจ๋ธ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ๊ณ  ์นœ์ ˆํ•˜๊ฒŒ ์—ฌ๋Ÿฌ ๊ธฐ๋ฒ•๋“ค์„ ์†Œ๊ฐœํ•˜๊ณ  ์žˆ์—ˆ๋‹ค.

๊ทธ ์ค‘์—์„œ ํŠนํžˆ ImageDataGenerator์™€ ReduceLROnPlateau์ด ์ธ์ƒ์ ์ด์˜€๋Š”๋ฐ ์ „์ž๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋žœ๋ค์œผ๋กœ ๋ณ€ํ™”์‹œ์ผœ ๊ธฐ์กด ๋ฐ์ดํ„ฐ์— Overfitting๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ Data Augmentationํˆด๋กœ tf.keras์—์„œ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ํ›„์ž๋Š” ๊ทธ ๋œป๋Œ€๋กœ ์•ˆ์ •๋˜๋ฉด ํ•™์Šต๋ฅ ์„ ๋‚ฎ์ถฐ์ฃผ๋Š” ์ฝœ๋ฐฑํ•จ์ˆ˜๋กœ ํ•™์Šต ์ค‘์— ์ง€ํ‘œ๋ฅผ ๊ณ„์† ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜์—ฌ ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ ์•ˆ์ •์ด ๋˜๋ฉด factor * lr๋กœ ํ˜„์žฌ์˜ ํ•™์Šต๋ฅ ์„ ์ˆœ์ฐจ์ ์œผ๋กœ ๋‚ฎ์ถ”์–ด min_lr์— ๊ทผ์ ‘ํ•˜๋„๋ก ํ•œ๋‹ค.

์ฃผ์˜ํ•  ์ ์€ ImageDataGenerator์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ˆ™์ง€ํ•˜์—ฌ ํ˜น์‹œ ๋ชจ๋ฅผ ์‹ค์ˆ˜๋ฅผ ๋ฐฉ์ง€ํ•ด์•ผํ•˜๋Š” ๋ฐ, ํŠนํžˆ Mnist์˜ ๊ฒฝ์šฐ flip์ด ์ผ์–ด๋‚˜์„  ์•ˆ๋˜๋ฉฐ cutout๋„ ์ง€์–‘ํ•œ๋‹ค.

์•„๋ž˜ ํ‘œ๋Š” SOPCNN์—์„œ ์ง„ํ–‰ํ•œ Data Augmentation ์ด๋‹ค.

technique Use

rotation

Only used with mnist

shearing

Yes

Shifting up and down

Yes

Zooming

Yes

rescale

Yes

cutout

No

flipping

No

In [ ]:

datagen_train = ImageDataGenerator(rotation_range = 10, 
                                   # 360๋„ ๊ธฐ์ค€์œผ๋กœ ์ •์ˆ˜ํ˜•์„ ๋„ฃ์–ด์•ผํ•œ๋‹ค. 10 -> 10๋„์•ˆ์—์„œ ํšŒ์ „
                                   width_shift_range = 0.25, 
                                   # 1์„ ๊ธฐ์ค€์œผ๋กœ 0.25๋งŒํผ ๊ฐ€๋กœ ์ด๋™, 1๋ณด๋‹ค ํฌ๋‹ค๋ฉด ์ด๋™ ํ”ฝ์…€์ˆ˜๋กœ ๋ณ€ํ™˜
                                   height_shift_range = 0.25, 
                                   # ์œ„์™€ ๊ฐ™์Œ
                                   shear_range = 0.1,  
                                   # ํœ˜์–ด์ง ์ •๋„
                                   zoom_range = 0.4,
                                   # ํ™•๋Œ€ ์ •๋„, ์ด ๊ฒฝ์šฐ [์ตœ์†Œ:0.6, ์ตœ๋Œ€:1.4] ์„ ์˜๋ฏธํ•œ๋‹ค. [0.7,1] ์ด๋Ÿฐ ์‹๋„ ๊ฐ€๋Šฅ
                                   horizontal_flip = False) 
# ์ˆ˜ํ‰ ๋’ค์ง‘๊ธฐ False๋กœ ๋ฐฉ์ง€, ํ•˜์ง€๋งŒ ์ด๋ฏธ default๊ฐ€ False๋ผ ๊ตณ์ด ํ•  ํ•„์š”๋Š” ์—†๋‹ค.

datagen_val = ImageDataGenerator() 

learning_rate_reduction = tf.keras.callbacks.ReduceLROnPlateau( 
    monitor='loss',    
    # Quantity to be monitored.
    factor=0.25,       
    # Factor by which the learning rate will be reduced. new_lr = lr * factor
    patience=2,        
    # The number of epochs with no improvement after which learning rate will be reduced.
    verbose=1,         
    # 0: quiet - 1: update messages.
    mode="auto",       
    # {auto, min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; 
    # in the max mode it will be reduced when the quantity monitored has stopped increasing; 
    # in auto mode, the direction is automatically inferred from the name of the monitored quantity.
    min_delta=0.0001,  
    # threshold for measuring the new optimum, to only focus on significant changes.
    cooldown=0,        
    # number of epochs to wait before resuming normal operation after learning rate (lr) has been reduced.
    min_lr=0.00001     
    # lower bound on the learning rate.
    )

3.2.1 ๋ชจ๋ธ๋ง

VGG์™€ ์œ ์‚ฌํ•œ ๋ชจํ˜•์„ ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ Conv2D(512) Layer๋Š” ์ œ๊ฑฐํ•œ ํ›„ flattenํ›„ FC(256)๋งŒ ์ฃผ์—ˆ๋‹ค๋Š” ๊ฒŒ ํŠน์ง•์ด๋‹ค.

์€๋‹‰์ธต์€ ๋ชจ๋‘ LeakyReLU๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

In [ ]:

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, (3,3), padding='same', input_shape=(28, 28, 1)),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(64,  (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(64,  (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),

    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Dropout(0.25),
    
    tf.keras.layers.Conv2D(128, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(128, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(128, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Dropout(0.25),    
    
    tf.keras.layers.Conv2D(256, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(256, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),##
    tf.keras.layers.LeakyReLU(alpha=0.1),

    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Dropout(0.25),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256),
    tf.keras.layers.LeakyReLU(alpha=0.1),
 
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10, activation='softmax')
])

optimizer = RMSprop(learning_rate=0.002,
    rho=0.9,
    momentum=0.1,
    epsilon=1e-07,
    centered=True,
    name='RMSprop')

model.compile(loss='categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

model.summary()

In [ ]:

es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=300, restore_best_weights=True)

history = model.fit(datagen_train.flow(x_train, y_train, batch_size=batch_size),
                              steps_per_epoch=len(x_train)//batch_size,
                              epochs=epochs,
                              validation_data=(x_val, y_val),
                              callbacks=[learning_rate_reduction, es],
                              verbose=2)

Colab์—์„œ Epoch 40๋งŒํผ ํ•™์Šต์‹œํ‚จ ๊ฒฐ๊ณผ Acc 99.74๋ž€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”๋‹ค. ๋งค์šฐ ์ข‹์€ ๊ฒฐ๊ณผ๋ผ ์ด ๋ชจ๋ธ๊ณผ ์—ฌ๋Ÿฌ ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.

4. ํ•™์Šต ๋ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •

VGG ๋ชจ๋ธ์— ๋ฐ”ํƒ•์„ ๋‘๊ณ  ์—ฌ๋Ÿฌ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์˜€๋‹ค. Conv๋ ˆ์ด์–ด ์ถ”๊ฐ€, Relu๋กœ ๋ฐ”๊พธ๊ธฐ, FC Layer ์กฐ์ •, Dropout ์กฐ์ • ๋“ฑ์„ ํ•ด๋ณด์•˜๋‹ค.

Pytorch์—์„  Transformer ๋ชจ๋ธ๋„ ์‹คํ—˜ํ•ด๋ณด์•˜์ง€๋งŒ ์„ฑ๋Šฅ์ด ๋ณ„๋กœ ์ข‹์ง€ ์•Š์•˜๋‹ค. (0.96)

๊ฒฐ๊ณผ์ ์œผ๋กœ SOPCNN์™€ VGG๋ฅผ ์ ์ ˆํžˆ ํ˜ผํ•ฉํ•œ ๋ชจ๋ธ์ด ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์•˜๋‹ค.

  1. ๋งˆ์ง€๋ง‰์—๋งŒ Dropout: ์ด๋•Œ Drop-rate 0.2 ~ 0.6 ๊นŒ์ง€ ๋‹ค์–‘ํ•˜๊ฒŒ ์ฃผ์—ˆ์„ ๋•Œ 0.25๊ฐ€ ๊ฐ€์žฅ ๊ดœ์ฐฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.

  2. Conv2D(512) ํ•˜๋‚˜ ์ถ”๊ฐ€: ๊ธฐ์กด VGG์—์„œ 4๊ฐœ๊ฐ€ ์Œ“์ด์ง€๋งŒ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ์™€ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ๊ณ ๋ คํ–ˆ์„ ๋•Œ ํ•˜๋‚˜๋งŒ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.

  3. FC Layer 1024: 256 ~ 2048(sopcnn) ~ 4096(vgg) ๋ชจ๋‘ ํ•ด๋ณด์•˜์„ ๋•Œ 1024๊ฐ€ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.

  4. Epoch์„ ์ถฉ๋ถ„ํžˆ ์ฃผ๊ณ  Early Stopping์œผ๋กœ ์ตœ์„ ์˜ ๋ชจ๋ธ์„ ์ฐพ๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

In [27]:

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, (3,3), padding='same', input_shape=(28, 28, 1)),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(64,  (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(64,  (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),

    #  ๊ธฐ์กด์˜ Dropout ์‚ญ์ œ
    tf.keras.layers.MaxPooling2D(2, 2),

    tf.keras.layers.Conv2D(128, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(128, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(128, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    
    #  ๊ธฐ์กด์˜ Dropout ์‚ญ์ œ
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Conv2D(256, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.Conv2D(256, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),##
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.MaxPooling2D(2,2),
    
    #  Conv2D 512 ์ถ”๊ฐ€
    tf.keras.layers.Conv2D(512, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),##
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.MaxPooling2D(2,2),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1024),
    tf.keras.layers.LeakyReLU(alpha=0.1),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.25),

    tf.keras.layers.Dense(10, activation='softmax')
])

In [25]:

model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_22 (Conv2D)           (None, 28, 28, 64)        640       
_________________________________________________________________
batch_normalization_24 (Batc (None, 28, 28, 64)        256       
_________________________________________________________________
leaky_re_lu_26 (LeakyReLU)   (None, 28, 28, 64)        0         
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 28, 28, 64)        36928     
_________________________________________________________________
batch_normalization_25 (Batc (None, 28, 28, 64)        256       
_________________________________________________________________
leaky_re_lu_27 (LeakyReLU)   (None, 28, 28, 64)        0         
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 28, 28, 64)        36928     
_________________________________________________________________
batch_normalization_26 (Batc (None, 28, 28, 64)        256       
_________________________________________________________________
leaky_re_lu_28 (LeakyReLU)   (None, 28, 28, 64)        0         
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_25 (Conv2D)           (None, 14, 14, 128)       73856     
_________________________________________________________________
batch_normalization_27 (Batc (None, 14, 14, 128)       512       
_________________________________________________________________
leaky_re_lu_29 (LeakyReLU)   (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_26 (Conv2D)           (None, 14, 14, 128)       147584    
_________________________________________________________________
batch_normalization_28 (Batc (None, 14, 14, 128)       512       
_________________________________________________________________
leaky_re_lu_30 (LeakyReLU)   (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_27 (Conv2D)           (None, 14, 14, 128)       147584    
_________________________________________________________________
batch_normalization_29 (Batc (None, 14, 14, 128)       512       
_________________________________________________________________
leaky_re_lu_31 (LeakyReLU)   (None, 14, 14, 128)       0         
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 7, 7, 128)         0         
_________________________________________________________________
conv2d_28 (Conv2D)           (None, 7, 7, 256)         295168    
_________________________________________________________________
batch_normalization_30 (Batc (None, 7, 7, 256)         1024      
_________________________________________________________________
leaky_re_lu_32 (LeakyReLU)   (None, 7, 7, 256)         0         
_________________________________________________________________
conv2d_29 (Conv2D)           (None, 7, 7, 256)         590080    
_________________________________________________________________
batch_normalization_31 (Batc (None, 7, 7, 256)         1024      
_________________________________________________________________
leaky_re_lu_33 (LeakyReLU)   (None, 7, 7, 256)         0         
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 3, 3, 256)         0         
_________________________________________________________________
conv2d_30 (Conv2D)           (None, 3, 3, 512)         1180160   
_________________________________________________________________
batch_normalization_32 (Batc (None, 3, 3, 512)         2048      
_________________________________________________________________
leaky_re_lu_34 (LeakyReLU)   (None, 3, 3, 512)         0         
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 1, 1, 512)         0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 512)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 1024)              525312    
_________________________________________________________________
leaky_re_lu_35 (LeakyReLU)   (None, 1024)              0         
_________________________________________________________________
batch_normalization_33 (Batc (None, 1024)              4096      
_________________________________________________________________
dropout_3 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_8 (Dense)              (None, 10)                10250     
=================================================================
Total params: 3,054,986
Trainable params: 3,049,738
Non-trainable params: 5,248
_________________________________________________________________

๊ฒฐ๊ณผ ๋ฐ ๋Š๋‚€์ 

์•„์ง ๋ชจ๋ธ์ด ๊นŠ์ง€ ์•Š์•„ ๊ทธ๋Ÿฐ ๊ฒƒ ๊ฐ™์ง€๋งŒ ์ข‹์€ ๋ชจ๋ธ์€ ์ฒ˜์Œ๋ถ€ํ„ฐ loss ๋–จ์–ด์ง€๋Š” ๊ฒŒ ๋‹ค๋ฅด๋‹ค. ์ข‹์€ ๋ชจ๋ธ์ผ ์ˆ˜๋ก epoch 10 ์•ˆ์ชฝ์—์„œ ๋น ๋ฅด๊ฒŒ Val_Acc๊ฐ€ ์ข‹๊ฒŒ ๋‚˜์™€ ์ดํ›„๋ฅผ ๊ฐ€๋Š ํ•ด๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ฌผ๋ก  epoch์„ 2000์ •๋„๋กœ ๋‘๊ณ  ๊นŠ๊ฒŒ ํ•™์Šต์‹œํ‚จ๋‹ค๋ฉด ์ข‹๊ฒ ์ง€๋งŒ ํ•œ์ •๋œ ์ž์›์—์„œ ๊ทธ๋‚˜๋งˆ ๋‚˜์€ ๋ชจ๋ธ์„ ๊ณจ๋ผ๋‚ด๊ธฐ ์œ„ํ•œ ์ตœ์„ ์ด ์•„๋‹๊นŒ ์‹ถ๋‹ค. ์ด ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ epoch 8์—์„œ 0.99์˜ val_acc๋ฅผ ๋ณด์ด๊ณ  ์ค‘๊ฐ„ ์ค‘๊ฐ„ 0.998๋ฅผ ์ƒํšŒํ•˜๊ธฐ๋„ ํ•˜์˜€๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์•Œ๊ฒŒ๋œ ๊ฒƒ์ด keras layer์˜ ๊ธฐ๋ณธ kernel weight initializer๊ฐ€ xavier๋ผ๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ๋˜์—ˆ๋‹ค. ์ข€ ๋” GPU ์ž์›์ด ํ—ˆ๋ฝํ–ˆ๋‹ค๋ฉด weight initializer๋„ ๋ฐ”๊ฟ”๋ณด๊ณ  batch_size๋„ ๋” ๋‹ค์–‘ํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ๋ž€ ์•„์‰ฌ์›€์ด ๋‚จ๋Š”๋‹ค.

๋ฒˆ์™ธ) Auto Keras

Automl๋กœ ์ž๋™์œผ๋กœ ๋ชจ๋ธ์„ ์งœ์ฃผ๋Š” ์‹œ๋Œ€์—์„œ ๊ณผ์—ฐ ๊ทธ ์„ฑ๋Šฅ์€ ์–ด๋–จ๊นŒ

In [ ]:

import autokeras as ak

In [ ]:

print(x_train.shape)  
print(y_train.shape)
(33600, 28, 28, 1)
(33600, 10)

In [ ]:

# Initialize the ImageClassifier.
clf = ak.ImageClassifier(max_trials=3)

# Search for the best model.
clf.fit(x_train, y_train,
        validation_data=(x_val, y_val), # validation set
        epochs=10)
Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 9s 9ms/step - loss: 0.1285 - accuracy: 0.9601 - val_loss: 0.0466 - val_accuracy: 0.9875
Epoch 2/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0476 - accuracy: 0.9849 - val_loss: 0.0360 - val_accuracy: 0.9902
Epoch 3/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0365 - accuracy: 0.9892 - val_loss: 0.0389 - val_accuracy: 0.9894
Epoch 4/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0292 - accuracy: 0.9907 - val_loss: 0.0374 - val_accuracy: 0.9907
Epoch 5/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0287 - accuracy: 0.9907 - val_loss: 0.0326 - val_accuracy: 0.9919
Epoch 6/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0249 - accuracy: 0.9922 - val_loss: 0.0316 - val_accuracy: 0.9895
Epoch 7/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0224 - accuracy: 0.9924 - val_loss: 0.0344 - val_accuracy: 0.9919
Epoch 8/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0176 - accuracy: 0.9941 - val_loss: 0.0341 - val_accuracy: 0.9923
Epoch 9/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0179 - accuracy: 0.9935 - val_loss: 0.0312 - val_accuracy: 0.9918
Epoch 10/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0164 - accuracy: 0.9940 - val_loss: 0.0356 - val_accuracy: 0.9919

Trial complete

Trial summary

|-Trial ID: 123c8d89e202f81d1fd46a1f9201f3fe|-Score: 0.03119376050014196|-Best step: 0

Hyperparameters:

|-classification_head_1/dropout_rate: 0.5|-classification_head_1/spatial_reduction_1/reduction_type: flatten|-dense_block_1/dropout_rate: 0|-dense_block_1/num_layers: 1|-dense_block_1/units_0: 128|-dense_block_1/use_batchnorm: False|-image_block_1/augment: False|-image_block_1/block_type: vanilla|-image_block_1/conv_block_1/dropout_rate: 0.25|-image_block_1/conv_block_1/filters_0_0: 32|-image_block_1/conv_block_1/filters_0_1: 64|-image_block_1/conv_block_1/kernel_size: 3|-image_block_1/conv_block_1/max_pooling: True|-image_block_1/conv_block_1/num_blocks: 1|-image_block_1/conv_block_1/num_layers: 2|-image_block_1/conv_block_1/separable: False|-image_block_1/normalize: True|-optimizer: adam

Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 77s 73ms/step - loss: 0.2367 - accuracy: 0.9337 - val_loss: 0.0599 - val_accuracy: 0.9837
Epoch 2/10
1050/1050 [==============================] - 69s 66ms/step - loss: 0.0999 - accuracy: 0.9753 - val_loss: 0.0854 - val_accuracy: 0.9829
Epoch 3/10
1050/1050 [==============================] - 69s 66ms/step - loss: 0.0586 - accuracy: 0.9848 - val_loss: 1.5777 - val_accuracy: 0.8914
Epoch 4/10
1050/1050 [==============================] - 71s 68ms/step - loss: 0.0444 - accuracy: 0.9887 - val_loss: 0.0594 - val_accuracy: 0.9870
Epoch 5/10
1050/1050 [==============================] - 71s 67ms/step - loss: 0.0477 - accuracy: 0.9876 - val_loss: 0.0479 - val_accuracy: 0.9887
Epoch 6/10
1050/1050 [==============================] - 69s 66ms/step - loss: 0.0505 - accuracy: 0.9888 - val_loss: 10.7493 - val_accuracy: 0.6519
Epoch 7/10
1050/1050 [==============================] - 68s 65ms/step - loss: 0.0536 - accuracy: 0.9868 - val_loss: 0.0613 - val_accuracy: 0.9869
Epoch 8/10
1050/1050 [==============================] - 68s 65ms/step - loss: 0.0280 - accuracy: 0.9922 - val_loss: 0.0520 - val_accuracy: 0.9874
Epoch 9/10
1050/1050 [==============================] - 73s 70ms/step - loss: 0.0305 - accuracy: 0.9926 - val_loss: 0.0414 - val_accuracy: 0.9904
Epoch 10/10
1050/1050 [==============================] - 72s 68ms/step - loss: 0.0369 - accuracy: 0.9911 - val_loss: 0.0493 - val_accuracy: 0.9902

Trial complete

Trial summary

|-Trial ID: 03db11dd05b1734a2cf3413c1ac7e197|-Score: 0.04135792684210588|-Best step: 0

Hyperparameters:

|-classification_head_1/dropout_rate: 0|-dense_block_1/dropout_rate: 0|-dense_block_1/num_layers: 2|-dense_block_1/units_0: 32|-dense_block_1/units_1: 32|-dense_block_1/use_batchnorm: False|-image_block_1/augment: True|-image_block_1/block_type: resnet|-image_block_1/normalize: True|-image_block_1/res_net_block_1/conv3_depth: 4|-image_block_1/res_net_block_1/conv4_depth: 6|-image_block_1/res_net_block_1/pooling: avg|-image_block_1/res_net_block_1/version: v2|-optimizer: adam

Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 7s 7ms/step - loss: 0.1271 - accuracy: 0.9611 - val_loss: 0.0507 - val_accuracy: 0.9870
Epoch 2/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0479 - accuracy: 0.9854 - val_loss: 0.0386 - val_accuracy: 0.9898
Epoch 3/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0371 - accuracy: 0.9886 - val_loss: 0.0337 - val_accuracy: 0.9911
Epoch 4/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0294 - accuracy: 0.9906 - val_loss: 0.0340 - val_accuracy: 0.9906
Epoch 5/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0257 - accuracy: 0.9915 - val_loss: 0.0327 - val_accuracy: 0.9914
Epoch 6/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0239 - accuracy: 0.9920 - val_loss: 0.0331 - val_accuracy: 0.9910
Epoch 7/10
1050/1050 [==============================] - 7s 7ms/step - loss: 0.0215 - accuracy: 0.9928 - val_loss: 0.0311 - val_accuracy: 0.9920
Epoch 8/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0192 - accuracy: 0.9934 - val_loss: 0.0313 - val_accuracy: 0.9918
Epoch 9/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0169 - accuracy: 0.9942 - val_loss: 0.0300 - val_accuracy: 0.9924
Epoch 10/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0166 - accuracy: 0.9945 - val_loss: 0.0324 - val_accuracy: 0.9918

Trial complete

Trial summary

|-Trial ID: 2ce4926fd3ec015466417c00c29b3ca4|-Score: 0.029994319529753708|-Best step: 0

Hyperparameters:

|-classification_head_1/dropout_rate: 0.5|-classification_head_1/spatial_reduction_1/reduction_type: flatten|-dense_block_1/dropout_rate: 0|-dense_block_1/num_layers: 1|-dense_block_1/units_0: 128|-dense_block_1/use_batchnorm: False|-image_block_1/augment: False|-image_block_1/block_type: vanilla|-image_block_1/conv_block_1/dropout_rate: 0.25|-image_block_1/conv_block_1/filters_0_0: 32|-image_block_1/conv_block_1/filters_0_1: 64|-image_block_1/conv_block_1/kernel_size: 3|-image_block_1/conv_block_1/max_pooling: True|-image_block_1/conv_block_1/num_blocks: 1|-image_block_1/conv_block_1/num_layers: 2|-image_block_1/conv_block_1/separable: False|-image_block_1/normalize: True|-optimizer: adam

INFO:tensorflow:Oracle triggered exit
Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 7s 7ms/step - loss: 0.1359 - accuracy: 0.9585 - val_loss: 0.0513 - val_accuracy: 0.9870
Epoch 2/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0482 - accuracy: 0.9852 - val_loss: 0.0372 - val_accuracy: 0.9899
Epoch 3/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0360 - accuracy: 0.9888 - val_loss: 0.0364 - val_accuracy: 0.9906
Epoch 4/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0303 - accuracy: 0.9905 - val_loss: 0.0340 - val_accuracy: 0.9906
Epoch 5/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0261 - accuracy: 0.9918 - val_loss: 0.0327 - val_accuracy: 0.9917
Epoch 6/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0234 - accuracy: 0.9923 - val_loss: 0.0310 - val_accuracy: 0.9924
Epoch 7/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0212 - accuracy: 0.9934 - val_loss: 0.0334 - val_accuracy: 0.9920
Epoch 8/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0192 - accuracy: 0.9937 - val_loss: 0.0354 - val_accuracy: 0.9919
Epoch 9/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0160 - accuracy: 0.9947 - val_loss: 0.0331 - val_accuracy: 0.9918
Epoch 10/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0160 - accuracy: 0.9945 - val_loss: 0.0416 - val_accuracy: 0.9912

์ƒ์„ฑ๋œ ๋ชจ๋ธ Summary ๋ฐ Sumbit

In [ ]:

model.summary()
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
normalization (Normalization (None, 28, 28, 1)         3         
_________________________________________________________________
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 12, 12, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 9216)              0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 9216)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                92170     
_________________________________________________________________
classification_head_1 (Softm (None, 10)                0         
=================================================================
Total params: 110,989
Trainable params: 110,986
Non-trainable params: 3
_________________________________________________________________

์‹ ๊ธฐํ•˜๊ฒŒ๋„ ๊ฝค ๋น„์Šทํ•œ ๋ชจ๋ธ์„ ์งฐ๋‹ค. ์›”์”ฌ ๋‹จ์ˆœํ•˜๊ณ  params๊ฐ€ 11๋งŒ ๋ฐ–์— ์•ˆ๋˜์ง€๋งŒ ์„ฑ๋Šฅ์€ Public Dashboard ๊ธฐ์ค€ 0.9911์ด ๋‚˜์™”๋‹ค.In [ ]:

# Evaluate on the testing data.
print('Accuracy: {accuracy}'.format(
    accuracy=clf.evaluate(x_val, y_val)))
263/263 [==============================] - 1s 4ms/step - loss: 0.0416 - accuracy: 0.9912
Accuracy: [0.04157625054905844, 0.9911905]

In [ ]:

model = clf.export_model() # Auto Keras๋กœ ์ƒ์„ฑ๋œ ๋ชจ๋ธ Export
save_and_submit(model, "autokeras.csv", "AutoKeras_iter3")

Validation ํ•™์Šต

In [ ]:

model.fit(x_val, y_val, epochs=5, batch_size=1024, verbose=3)
save_and_submit(model, "submit.csv", "AK+val")

Validation Set๊นŒ์ง€ ํ•™์Šต์‹œํ‚ค๋‹ˆ 0.993 ์œผ๋กœ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜์—ˆ๋‹ค.

Last updated

Was this helpful?