Advanced Neural Net (2)
๊ณผ์ ๋ด์ฉ ์ค๋ช
์บ๊ธ Kannada MNIST๋ฅผ ์ด์ฉํ ๋ฏธ๋๋ํ
์ฐ์๊ณผ์ ์ ์ ์ด์
์กฐ์์ฐ๋์ ์๋ฒฝํ ๋ ธํธ๋ถ์ด์์ต๋๋ค. keras sklearn wrapper๋ฅผ ์ด์ฉํด์ ๊ทธ๋ฆฌ๋ ์์น ํ์ดํผ ํ๋ผ๋ฏธํฐ ํ๋์ ์งํํ์ ์ , hiplot์ด๋ผ๋ ๋ชจ๋๋ก ๊ฐ ๋ ์ด์ด๋ณ ์๊ฐํ ์งํํ์๊ณ ์ด๋ฅผ ํตํด ์ฑ๋ฅ์ด ๋์ ๋ฐฐ์น์ ๊ทํ, ๋๋กญ์์ ๋ฑ์ ๋ถ์ํ์ ์ , ๋ํ SOPCNN์ด๋ผ๋ MNIST SOTA ๋ชจ๋ธ์ ์ฐพ์ ๋ณด์ ์ , Autokeras ์คํ๊น์ง ๋ฐฐ์ธ๊ฒ ๋ง์ ์ต๊ณ ์ ๋ ธํธ๋ถ์ด์์ต๋๋ค.
7์ฃผ์ฐจ: Deep Learning Framework
13๊ธฐ ์กฐ์์ฐ
๊ณผ์ : Kannada MNIST (https://www.kaggle.com/c/tobigs13nn)
๋ฐ์ดํฐ: Train: 42,000 rows, Test: 18,000 rows
๋ชฉ์ฐจ
๋ฐ์ดํฐ ์ ์ฒ๋ฆฌ
์ฌ๋ฌ ๋ฅ๋ฌ๋ ์ฌํ ๊ธฐ๋ฒ ๋น๊ต (with Hiplot)
Activation
Relu / Leaky Relu / PRelu
Softmax
Batch Norm
Weight Init
Optimizer
Rmsprop, Adam, Radam
Regularzation
Dropout, Spatial Dropout
Early Stopping
Data Augmentation
๋ชจ๋ธ ๋ฆฌ์์น ๋ฐ ์ ์
ํ์ต ๋ฐ ํ์ดํผ ํ๋ผ๋ฏธํฐ ์กฐ์
0. Pre-requisite & Module Import
In [ ]:
!pip install tensorflow-gpu==2.1.0 # Keras Data Augmentation ํด ์ด์ฉ์ ํ์, ๊ทธ๋ฅ tf 2.0์ผ๋ก ํ๋ฉด ์๋ฌ
# !pip install autokeras
# !pip install hiplot
# !pip install kaggle
In [ ]:
from __future__ import absolute_import, division, print_function, unicode_literals
try:
%tensorflow_version 2.x # Colab์์ ํ
์ํ๋ก์ฐ 2.0์ ์ฌ์ฉํ ์ ์๋ Magic Method
except Exception:
pass
import tensorflow as tf
import os
import tensorflow_datasets as tfds
TensorFlow 2.x selected.
In [ ]:
%load_ext tensorboard
In [8]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, Dropout, Dense, Flatten, BatchNormalization, MaxPooling2D, LeakyReLU, ReLU, PReLU
from tensorflow.keras.optimizers import RMSprop, Nadam, Adadelta, Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.regularizers import l2
In [ ]:
!ls ~/.kaggle/
In [ ]:
# Kaggle API Key to Root
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
In [ ]:
import os
def save_and_submit(model, filename, description):
"""
CSVํ์ผ ์ ์ฅ ํ Kaggle๋ก ์ ์ก
"""
if "predict_classes" in dir(model):
predict = model.predict_classes(X_test)
else:
# Predict_classes ํจ์๊ฐ ์๋ ๊ฒฝ์ฐ predict ํ argmax ๋ก predict ์์ฑ
predict = np.argmax(model.predict(X_test), axis=1)
sample_submission['Category'] = predict
sample_submission.to_csv(filename,index=False)
# system ํจ์๋ก kaggle ๋ช
๋ น์ด ์คํ
os.system(f"kaggle competitions submit -c tobigs13nn -f {filename} -m {description}")
In [3]:
import glob
glob.glob("*")
Out[3]:
['w7_pytorch.ipynb',
'test_df.csv',
'Untitled.ipynb',
'FirePytorch.ipynb',
'w7_DL_Framework.ipynb',
'FireKeras.ipynb',
'tutorial-for-everybody.ipynb',
'grid_result.json',
'train_df.csv',
'sample_submission.csv']
In [ ]:
# GPU ์
ํ
ํ์ธ
tf.test.gpu_device_name()
Out[ ]:
'/device:GPU:0'
1. Data Load & Preprocessing
In [ ]:
sample_submission = pd.read_csv("sample_submission.csv")
test = pd.read_csv("test_df.csv")
train = pd.read_csv("train_df.csv")
print(f"Train data shape {train.shape}")
print(f"Test data shape {test.shape}")
X = train.iloc[:,1:].values
y = train.iloc[:,0].values
X_test = test.iloc[:,1:].values
X = X / 255
X_test = X_test / 255
# ์ด๋ฏธ์ง๋ก ์ฒ๋ฆฌํ๊ธฐ ์ํด ๋ณํ
X = X.reshape(-1, 28, 28,1)
X_test = X_test.reshape(-1, 28, 28,1)
y = to_categorical(y)
Train data shape (42000, 785)
Test data shape (18000, 785)
In [ ]:
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state=42)
In [ ]:
# ๋ฐ์ดํฐ ํฌ๊ธฐ ํ์ธ
x_train.shape, x_val.shape, X_test.shape
Out[ ]:
((33600, 28, 28, 1), (8400, 28, 28, 1), (18000, 28, 28, 1))
2. ๋ฅ๋ฌ๋ ์ฌํ ๊ธฐ๋ฒ Grid Search (with Hiplot)
๋ค์ํ ๊ธฐ๋ฒ๊ณผ ํ์ดํผ ํ๋ผ๋ฏธํฐ ํ๋์ ํตํด ๊ฐ ๊ธฐ๋ฒ ๋ค์ ํจ๊ณผ๋ฅผ ์ ๋ฆฌ
param_grid = {
'batch_size': [512, 1024, 2048],
'epochs': [20],
'_optimizer': ['RMSprop','Adam'],
'_lr': [1e-3, 2e-3, 1e-2],
'_batch_norm': [1, 0],
'_activation': ['relu'],
'_dropout': [0.2, 0.4],
}
์ด๋ฒ์ ๋ฐฐ์ด ์ฌ๋ฌ๊ฐ์ง ๊ธฐ๋ฒ๋ค์ ์ /๋ฌด,ํฌ๊ธฐ ์กฐ์ ๋ฑ์ ํตํด ๊ทธ ํจ๊ณผ๋ฅผ ๋ถ์ํ๊ณ ๊ฐ์ฅ ์ต์ ์ ํ๋ผ๋ฏธํฐ๋ฅผ ๊ตฌํด๋ณธ๋ค. ํ๋ผ๋ฏธํฐ๋ณ๋ก ์ต์ 3๊ฐ์์ 5๊ฐ๊น์ง ์ต์ ์ ์ฃผ๊ณ ์ถ์์ง๋ง Colab ๋ฉ๋ชจ๋ฆฌ๊ฐ ๊ฐ๋นํ์ง ๋ชปํ๊ณ ๊ณ์ ํฐ์ ธ์ ๊ฐ๋ฅํ ์์ค์ผ๋ก ๋ฎ์ถ์๋ค.
2.1 Grid Search ์ค๋น ๋ฐ ๋ชจ๋ธ ์์ฑ
In [ ]:
# ์ด์ ์ ๋ฐฐ์ด scikit-learn์ GridSearchCV์ KerasClassifier ์ด์ฉ
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
In [ ]:
# Function to create model, required for KerasClassifier
def create_model(_optimizer, _lr, _batch_norm, _activation, _dropout):
"""
๊ธฐ๋ฒ ๊ฐ ๋น๊ต๋ฅผ ์ํ ๋ชจ๋ธ ์์ฑ (FC 3-Layer)
"""
model = tf.keras.models.Sequential()
model.add(Dense(64, input_shape=(784,)))
model.add(BatchNormalization()) if _batch_norm else '' # batch_norm ์ /๋ฌด ๊ฒฐ์
model.add(Dropout(_dropout))
model.add(Dense(128, activation=_activation))
model.add(BatchNormalization()) if _batch_norm else '' # batch_norm ์ /๋ฌด ๊ฒฐ์
model.add(Dropout(_dropout))
model.add(Dense(256, activation=_activation))
model.add(BatchNormalization()) if _batch_norm else '' # batch_norm ์ /๋ฌด ๊ฒฐ์
model.add(Dropout(_dropout))
model.add(Dense(10, activation='softmax'))
optimizer = getattr(tf.keras.optimizers, _optimizer)(learning_rate=_lr)
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
return model
# KerasClassifier๋ก ๋ชจ๋ธ ์์ฑ๊ธฐ ์ค์
model = KerasClassifier(build_fn=create_model, verbose=3)
In [ ]:
# Grid ์ค์
param_grid = {
'batch_size': [1024, 2048],
'epochs': [15, 30],
'_optimizer': ['RMSprop','Adam'],
'_lr': [1e-3, 2e-3, 1e-2],
'_batch_norm': [1, 0],
'_activation': ['relu'],
'_dropout': [0.2, 0.4],
}
In [ ]:
param_grid
Out[ ]:
{'batch_size': [2048],
'epochs': [15, 30],
'_optimizer': ['RMSprop', 'Adam'],
'_lr': [0.001, 0.01],
'_batch_norm': [True, False],
'_activation': ['relu'],
'_dropout': [0.2, 0.4]}
In [ ]:
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
grid_result = grid.fit(X, y) #Grid Search ํ์ต
grid_result
Out[ ]:
GridSearchCV(cv=3, error_score=nan,
estimator=<tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x7f61b1e6b6d8>,
iid='deprecated', n_jobs=None,
param_grid={'_activation': ['relu'], '_batch_norm': [True, False],
'_dropout': [0.2, 0.4], '_lr': [0.001, 0.01],
'_optimizer': ['RMSprop', 'Adam'],
'batch_size': [2048], 'epochs': [15, 30]},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring=None, verbose=0)
In [ ]:
grid_result.cv_results_['rank_test_score'] # ํ
์คํธ Score ์์
grid_result.cv_results_['mean_test_score'] # ํ
์คํธ Score ํ๊ท
grid_result.cv_results_['std_test_score'] # ํ
์คํธ Score std
Out[ ]:
array([0.00191339, 0.00231553, 0.00210279, 0.0021028 , 0.00749656,
0.00113289, 0.00187718, 0.00275614, 0.0021762 , 0.00204734,
0.00289577, 0.00242509, 0.0027911 , 0.00478707, 0.00220003,
0.00271803, 0.00134518, 0.00207868, 0.00301714, 0.00154302,
0.00933254, 0.00683188, 0.00201384, 0.0019293 , 0.00253702,
0.00236972, 0.00268085, 0.00193342, 0.00889858, 0.01590848,
0.0020548 , 0.00267833])
In [ ]:
# ์๊ฐํ๋ฅผ ์ํ ์ ์ฒ๋ฆฌ
grid_params = grid_result.cv_results_['params']
for i in range(len(grid_params)):
for key in ['mean_test_score', 'std_test_score', 'mean_fit_time']:
grid_params[i][key] = grid_result.cv_results_[key][i]
In [ ]:
import json
In [ ]:
# grid_params ๊ฒฐ๊ณผ ์ ์ฅ
with open('grid_result.json','w') as f:
f.write(json.dumps(grid_params))
In [11]:
# grid_params ๊ฒฐ๊ณผ ๋ถ๋ฌ์ค๊ธฐ
with open('grid_result2.json','r') as f:
grid_params = json.loads(f.read())
In [28]:
# grid_params
2.2 ๊ฒฐ๊ณผ ์๊ฐํ
์๊ฐํ๋ก Facebook Research์ Hiplot( https://github.com/facebookresearch/hiplot )์ ์ฌ์ฉํ์๋ค.
ํนํ ๊ณ ์ฐจ์์ ๋ฐ์ดํฐ ๊ฐ์ ํจํด์ ๋ณด๊ธฐ์ ์ฉ์ดํ๋ค๊ณ ํ๋ค.
In [12]:
import hiplot as hip
hip.Experiment.from_iterable(grid_params).display()
<IPython.core.display.Javascript object>
Out[12]:
<hiplot.ipython.IPythonExperimentDisplayed at 0x114de8bd0>

์ ์ฒด ๊ทธ๋ํ

mean_score, Std_score ์์ ๋ชจ๋ธ ๊ทธ๋ํ
์ธ์ฌ์ดํธ
ํ์คํ ์ต์์๊ถ์ ๋ชจ๋ธ์ BatchNorm์ด ๊ฑฐ์ ๋ชจ๋ ์ ์ฉ๋์ด ์๋ค.
Dropout์ 0.2์์ ์ข์ ์ฑ๋ฅ์ ๋ณด์๋ค.
Lr์ 0.01์ด ์ข๊ฒ ๋์์ง๋ง, ์ด๋ Epoch์ด 20 ๋ฐ์ ์๋์ด ์์ง ํ์ต์ด ์งํ ์ค์ผ ๊ฐ๋ฅ์ฑ์ด ํฌ๋ค.
Batch Size๋ ๋ฎ์ ์๋ก ์ข์ ์ฑ๋ฅ์ด ๋์๋ค. ์์ ๋น๋ก๋ผ ๋ณด๊ธด ํ๋ค๊ฒ ์ง๋ง ๋ ผ๋ฌธ์์๋ 256์ ์ ์ฉํ ๊ฒ์ ๋ณด๋ฉด ๋๋ฌด ํฌ๋ฉด ๊ฐ Batch์ ํน์ฑ์ด ๋ญ๊ฐ์ง๋๊ฒ ์๋๊น ์ถ์ธก๋๋ค.
Optimizer๋ Adam๊ณผ RMSprop ๋ชจ๋ ๊ด์ฐฎ์ ์ฑ๋ฅ์ด ๋์๋ค.
3. Model Research & Selection
MNIST ๋ฐ์ดํฐ์ ์ ๋ํ์ ์ธ ์ด๋ฏธ์ง ๋ถ๋ฅ ๋ฐ์ดํฐ์ ์ผ๋ก ๋ง์ ๋ฅ๋ฌ๋ ๋ชจ๋ธ์ด SOTA๊ธ ์ฑ๋ฅ์ ๋ณด์ฌ์ฃผ๊ณ ์๋ค.
๊ทธ์ค์์ ํนํ CNN ๋ชจ๋ธ์ด ๋น ๋ฅธ ํ์ต ์๋์ ์๋ฑํ ์ฑ๋ฅ์ ๋ณด์ฌ์ฃผ๊ณ ์์ผ๋ฉฐ ์ด๋ฒ ๊ณผ์ ์ ์ ์ ์ธ Kannada MNIST์์๋ ๋ง์ ์ฌ๋๋ค์ด ํด๋น ๋ชจ๋ธ์ ํตํด ์ข์ ์ฑ์ ์ ๊ฑฐ๋์๋ค. ์ด๋ฅผ ์ฐธ๊ณ ํ์ฌ ๋ชจ๋ธ์ ์ค๊ณํ๊ณ , Papers with code ์์ SOTA๊ธ ๋ ผ๋ฌธ ๋ค์ ์ฐธ๊ณ ํ์ฌ ์ฌ๋ฌ ๊ธฐ๋ฒ ๋ค์ ์ฅ๋จ์ ์ ์ฐธ๊ณ ํ์ฌ ์ ์ฉํด๋ณธ๋ค. ๊ทธ๋ฆฌ๊ณ ์ค์ ๋ก ๊ฝค ์ข์ ์ธ์ฌ์ดํธ๋ฅผ ์ป์ ์ ์์๋ค.
3.1 SOPCNN
Stochastic Optimization of Plain Convolutional Neural Networks with Simple methods
2020 MNIST SOTA
MNIST 2020๋ SOTA์ธ SOPCNN ๋ ผ๋ฌธ์ ๋ณด๋ฉด, ์ต์ ํ ๊ธฐ๋ฒ์ ์๋นํ ๊ณต์ ๋ค์์์ ์ ์ ์๊ณ ํนํ ์ด๋ฒ ๋ด์ฉ๊ณผ ๊ฒน์น๋ ๋ถ๋ถ์ด ๊ฐ์ ธ์ ์ ์ฉํด๋ณด๋ฉด ์ข์ ์ ์ด ๋ง์๋ค. ์ด ๋ ผ๋ฌธ์ด ์งํฅํ๋ ๋ฐ๋ CNN๋ชจ๋ธ์ด ์ข์ ์ฑ๋ฅ์ ๋ณด์ด์ง๋ง epoch์ด ์ฆ๊ฐํจ์ ๋ฐ๋ฅธ overfitting ๋ฌธ์ ๊ฐ ์ฌํ์ฌ ์ด๋ฅผ ์ด๋ป๊ฒ ์ ์ต์ ํํ ์ง๋ก Data Augmentation๊ณผ ํนํ Dropout์ ์ฃผ๋ก ๋ค๋ฃจ๊ณ ์๋ค.
Architecture and Design
๊ธฐ๋ณธ์ ๋ชจ๋ธ ๊ตฌ์ฑ์ SimpleNet์ ๊ตฌ์ฑ์ ๋ฐ๋ฅด๊ณ ์๋ค๊ณ ํ๋ค. MNIST ๋ชจ๋ธ์ ์๋ฅผ ๋ค๋ฉด, ์ด 4๊ฐ์ Conv2D layer ๊ฐ ์๊ณ 2๊ฐ ๋ง๋ค Max Pooling Layer๊ฐ ๋ถ๋๋ค. ๋ค์ด์ด 2๊ฐ์ Fully Connected Layer, ๋ง์ง๋ง์ Softmax Activation Layer๋ฅผ ๋ถ์ฌ ๋ชจ๋ธ์ ์์ฑ์์ผฐ๋ค. ์ฌ๊ธฐ์ ํ์ต๋ฅ ์ 0.01๋ก ์ฃผ์๊ณ Dropout ์์น ๋ฐ FC Layer์ ํฌ๊ธฐ๋ ํ๋ผ๋ฏธํฐ ์กฐ์ ์ ํตํด ๊ฒฐ์ ํ์๋ค.
๊ฐ์ฅ ์ธ์๊น์๋ ์
Dropout์ Softmax ์ง์ ์ ํ๋๋ง ์๋ ๊ฒ์ด ๊ฐ์ฅ ์ข๋ค :Maxpool ๋ค์ ๋ฐฐ์นํ๊ธฐ๋ ํ๊ณ , Spatial Dropout๋ ์ ์ฉํด๋ณด์์ง๋ง ๊ทธ๋ฅ Regular Dropout์ FC ๋ค์์ ๋ฐฐ์นํ ๊ฒ์ด ๊ฐ์ฅ ์ฑ๋ฅ์ด ์ข์๋ค๊ณ ํ๋ค.
FC Layer 2048, Drop rate 0.8์ด ๊ฐ์ฅ ์ข๋ค: ์ ์ด ๋ ผ๋ฌธ ์ ๋ชฉ์์ Stochastic์ด๋ ๋ง์ ์ผ๋์ง ์ ์ ์๋ ๋๋ชฉ์ด๋ค. ๋ฏฟ๊ธฐ์ง ์๋ ๋๋กญ์จ์ด๋ผ 5๋ฒ์ ๋ฐ๋ณต ์คํ์ ํตํด ํ๊ท 0.18%์ ์๋ฌ์จ์ด ๋์จ๋ค๋ ๊ฒ์ ์ ์ฆํ์๋ค.
๊ทธ ์ธ์๋ Data Augmentation ์ ํ ๋ฑ์ ์๊ฐํ๊ณ ์์ด ํ์ต์ ์ฐธ๊ณ ํ ์ ์์๋ค.
3.1.1 ๊ตฌํ
In [9]:
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3,3), padding='same', input_shape=(28, 28, 1)),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(2048),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Dense(2048),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Dropout(0.8),
tf.keras.layers.Dense(10, activation='softmax')
])
WARNING:tensorflow:Large dropout rate: 0.8 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
In [10]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 64) 640
_________________________________________________________________
batch_normalization (BatchNo (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 28, 28, 64) 36928
_________________________________________________________________
batch_normalization_1 (Batch (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 14, 14, 128) 73856
_________________________________________________________________
batch_normalization_2 (Batch (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 14, 14, 128) 147584
_________________________________________________________________
batch_normalization_3 (Batch (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 6272) 0
_________________________________________________________________
dense (Dense) (None, 2048) 12847104
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU) (None, 2048) 0
_________________________________________________________________
dense_1 (Dense) (None, 2048) 4196352
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU) (None, 2048) 0
_________________________________________________________________
dropout (Dropout) (None, 2048) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 20490
=================================================================
Total params: 17,324,490
Trainable params: 17,323,722
Non-trainable params: 768
_________________________________________________________________
ํ ์ํ๋ก์ฐ๊ฐ ์น์ ํ๊ฒ Drop rate๊ฐ 0.5๊ฐ ๋์ผ๋ ํน์ ๊ณผ๊ฑฐ์์ ์ค์ ๋ถ์ธ์ง ์ฌ์ญ๊ณ ์์ง๋ง ๋ฌด์ํ๊ณ ํ์ต์ ์งํํด๋ณธ๋ค.
In [ ]:
optimizer = Adam(learning_rate=0.01) # ๋
ผ๋ฌธ ์ค์ ๋๋ก 0.01์ ์ฃผ์๋ค.
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
3.1.2 ํ์ต ๋ฐ ๊ฒฐ๊ณผ
์ค์ Colab์์ ๋๋ ค๋ณธ ๊ฒฐ๊ณผ ํ์ต์ด ๋๋ฌด ์๋์๋ค. Val_Acc๊ฐ 0.98์ ์ ๊ทผ์กฐ์ฐจ ๋ชปํ๊ณ ๋ชจ๋ธ์ด ์ ํ Simpleํ์ง ์์ ํ์ต์๋ ์ค๋์๊ฐ์ด ๊ฑธ๋ ธ๋ค.
๊ทธ ์ดํ๋ก FC์ ํ์ดํผ ํ๋ผ๋ฏธํฐ 2048๋ฅผ 1024๋ก, Drop Rate๋ฅผ 0.6, 0.4๋ก ๊ฐ๊ธฐ ์คํํด๋ณด์์ ๋ Val Acc 99.6๋ก ๋น์ทํ๊ฒ ๋์ ์ฑ๋ฅ์ด ๋์๋ค.
์ฌ๊ธฐ์ ๋ฐ์ดํฐ ํฌ๊ธฐ๋ ๋ถ๋ฅํ ๊ฐฏ์๋ก FC ํ๋ผ๋ฏธํฐ๋ฅผ ์ข ๋ ์๊ฒ ์กฐ์ ํ ํ์๊ฐ ์๋ค๋ ๊ฒ์ ์ ์ ์์์ผ๋ฉฐ, ๋ ผ๋ฌธ์์ Epoch์ 2000๊น์ง ์งํํ๋๋ฐ ์ง๊ธ Colab์์ ๊ทธ๊ฑด ํ๋ค๊ธฐ ๋๋ฌธ์ Drop rate์์ ์ด๋์ ๋ ํํ์ ๋ณด์ ๋น ๋ฅธ ํ์ต์ ์งํํด์ผ ๊ฒ ๋ค๋ ๋ฐฉํฅ์ฑ์ ์ธ์ธ ์ ์์๋ค.
3.2 CNN (VGG + Data Augmentation)
https://www.kaggle.com/benanakca/kannada-mnist-cnn-tutorial-with-app-top-2
https://www.kaggle.com/c/Kannada-MNIST ์์ ์์ 2%์ ์ฑ๋ฅ์ด ๋์จ ๋ชจ๋ธ์ ์ฐพ์ ์ ์์๊ณ ์น์ ํ๊ฒ ์ฌ๋ฌ ๊ธฐ๋ฒ๋ค์ ์๊ฐํ๊ณ ์์๋ค.
๊ทธ ์ค์์ ํนํ ImageDataGenerator์ ReduceLROnPlateau์ด ์ธ์์ ์ด์๋๋ฐ ์ ์๋ ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ฅผ ๋๋ค์ผ๋ก ๋ณํ์์ผ ๊ธฐ์กด ๋ฐ์ดํฐ์ Overfitting๋๋ ๊ฒ์ ๋ฐฉ์งํ๊ธฐ ์ํ Data Augmentationํด๋ก tf.keras์์ ๋ถ๋ฌ์ฌ ์ ์๋ค. ํ์๋ ๊ทธ ๋ป๋๋ก ์์ ๋๋ฉด ํ์ต๋ฅ ์ ๋ฎ์ถฐ์ฃผ๋ ์ฝ๋ฐฑํจ์๋ก ํ์ต ์ค์ ์งํ๋ฅผ ๊ณ์ ๋ชจ๋ํฐ๋ง ํ์ฌ ์ผ์ ์์ค ์ด์ ์์ ์ด ๋๋ฉด factor * lr๋ก ํ์ฌ์ ํ์ต๋ฅ ์ ์์ฐจ์ ์ผ๋ก ๋ฎ์ถ์ด min_lr์ ๊ทผ์ ํ๋๋ก ํ๋ค.
์ฃผ์ํ ์ ์ ImageDataGenerator์ ํ๋ผ๋ฏธํฐ๋ฅผ ์์งํ์ฌ ํน์ ๋ชจ๋ฅผ ์ค์๋ฅผ ๋ฐฉ์งํด์ผํ๋ ๋ฐ, ํนํ Mnist์ ๊ฒฝ์ฐ flip์ด ์ผ์ด๋์ ์๋๋ฉฐ cutout๋ ์ง์ํ๋ค.
์๋ ํ๋ SOPCNN์์ ์งํํ Data Augmentation ์ด๋ค.
technique Use
rotation
Only used with mnist
shearing
Yes
Shifting up and down
Yes
Zooming
Yes
rescale
Yes
cutout
No
flipping
No
In [ ]:
datagen_train = ImageDataGenerator(rotation_range = 10,
# 360๋ ๊ธฐ์ค์ผ๋ก ์ ์ํ์ ๋ฃ์ด์ผํ๋ค. 10 -> 10๋์์์ ํ์
width_shift_range = 0.25,
# 1์ ๊ธฐ์ค์ผ๋ก 0.25๋งํผ ๊ฐ๋ก ์ด๋, 1๋ณด๋ค ํฌ๋ค๋ฉด ์ด๋ ํฝ์
์๋ก ๋ณํ
height_shift_range = 0.25,
# ์์ ๊ฐ์
shear_range = 0.1,
# ํ์ด์ง ์ ๋
zoom_range = 0.4,
# ํ๋ ์ ๋, ์ด ๊ฒฝ์ฐ [์ต์:0.6, ์ต๋:1.4] ์ ์๋ฏธํ๋ค. [0.7,1] ์ด๋ฐ ์๋ ๊ฐ๋ฅ
horizontal_flip = False)
# ์ํ ๋ค์ง๊ธฐ False๋ก ๋ฐฉ์ง, ํ์ง๋ง ์ด๋ฏธ default๊ฐ False๋ผ ๊ตณ์ด ํ ํ์๋ ์๋ค.
datagen_val = ImageDataGenerator()
learning_rate_reduction = tf.keras.callbacks.ReduceLROnPlateau(
monitor='loss',
# Quantity to be monitored.
factor=0.25,
# Factor by which the learning rate will be reduced. new_lr = lr * factor
patience=2,
# The number of epochs with no improvement after which learning rate will be reduced.
verbose=1,
# 0: quiet - 1: update messages.
mode="auto",
# {auto, min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing;
# in the max mode it will be reduced when the quantity monitored has stopped increasing;
# in auto mode, the direction is automatically inferred from the name of the monitored quantity.
min_delta=0.0001,
# threshold for measuring the new optimum, to only focus on significant changes.
cooldown=0,
# number of epochs to wait before resuming normal operation after learning rate (lr) has been reduced.
min_lr=0.00001
# lower bound on the learning rate.
)
3.2.1 ๋ชจ๋ธ๋ง
VGG์ ์ ์ฌํ ๋ชจํ์ ํ๊ณ ์์ผ๋ฉฐ Conv2D(512) Layer๋ ์ ๊ฑฐํ ํ flattenํ FC(256)๋ง ์ฃผ์๋ค๋ ๊ฒ ํน์ง์ด๋ค.
์๋์ธต์ ๋ชจ๋ LeakyReLU๋ฅผ ์ฌ์ฉํ์๋ค.
In [ ]:
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3,3), padding='same', input_shape=(28, 28, 1)),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(256, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(256, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),##
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dense(10, activation='softmax')
])
optimizer = RMSprop(learning_rate=0.002,
rho=0.9,
momentum=0.1,
epsilon=1e-07,
centered=True,
name='RMSprop')
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
model.summary()
In [ ]:
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=300, restore_best_weights=True)
history = model.fit(datagen_train.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=len(x_train)//batch_size,
epochs=epochs,
validation_data=(x_val, y_val),
callbacks=[learning_rate_reduction, es],
verbose=2)
Colab์์ Epoch 40๋งํผ ํ์ต์ํจ ๊ฒฐ๊ณผ Acc 99.74๋ ๊ฒฐ๊ณผ๊ฐ ๋์๋ค. ๋งค์ฐ ์ข์ ๊ฒฐ๊ณผ๋ผ ์ด ๋ชจ๋ธ๊ณผ ์ฌ๋ฌ ๊ธฐ๋ฒ์ ์ ์ฉํ์ฌ ํ์ดํผ ํ๋ผ๋ฏธํฐ ํ๋์ ์งํํ์๋ค.
4. ํ์ต ๋ฐ ํ์ดํผ ํ๋ผ๋ฏธํฐ ์กฐ์
VGG ๋ชจ๋ธ์ ๋ฐํ์ ๋๊ณ ์ฌ๋ฌ ์ต์ ํ ๊ธฐ๋ฒ์ ์ ์ฉํ์๋ค. Conv๋ ์ด์ด ์ถ๊ฐ, Relu๋ก ๋ฐ๊พธ๊ธฐ, FC Layer ์กฐ์ , Dropout ์กฐ์ ๋ฑ์ ํด๋ณด์๋ค.
Pytorch์์ Transformer ๋ชจ๋ธ๋ ์คํํด๋ณด์์ง๋ง ์ฑ๋ฅ์ด ๋ณ๋ก ์ข์ง ์์๋ค. (0.96)
๊ฒฐ๊ณผ์ ์ผ๋ก SOPCNN์ VGG๋ฅผ ์ ์ ํ ํผํฉํ ๋ชจ๋ธ์ด ๊ฐ์ฅ ์ฑ๋ฅ์ด ์ข์๋ค.
๋ง์ง๋ง์๋ง Dropout: ์ด๋ Drop-rate 0.2 ~ 0.6 ๊น์ง ๋ค์ํ๊ฒ ์ฃผ์์ ๋ 0.25๊ฐ ๊ฐ์ฅ ๊ด์ฐฎ์ ์ฑ๋ฅ์ ๋ณด์๋ค.
Conv2D(512) ํ๋ ์ถ๊ฐ: ๊ธฐ์กด VGG์์ 4๊ฐ๊ฐ ์์ด์ง๋ง ๋ฐ์ดํฐ ํฌ๊ธฐ์ ์ด๋ฏธ์ง ํฌ๊ธฐ๋ฅผ ๊ณ ๋ คํ์ ๋ ํ๋๋ง ์ถ๊ฐํ๋ ๊ฒ์ด ๊ฐ์ฅ ๋์ ์ฑ๋ฅ์ ๋ณด์๋ค.
FC Layer 1024: 256 ~ 2048(sopcnn) ~ 4096(vgg) ๋ชจ๋ ํด๋ณด์์ ๋ 1024๊ฐ ๊ฐ์ฅ ์ข์ ์ฑ๋ฅ์ ๋ณด์๋ค.
Epoch์ ์ถฉ๋ถํ ์ฃผ๊ณ Early Stopping์ผ๋ก ์ต์ ์ ๋ชจ๋ธ์ ์ฐพ๋ ๊ฒ์ด ์ข๋ค.
In [27]:
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3,3), padding='same', input_shape=(28, 28, 1)),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
# ๊ธฐ์กด์ Dropout ์ญ์
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
# ๊ธฐ์กด์ Dropout ์ญ์
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(256, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(256, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),##
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
# Conv2D 512 ์ถ๊ฐ
tf.keras.layers.Conv2D(512, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),##
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1024),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(10, activation='softmax')
])
In [25]:
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_22 (Conv2D) (None, 28, 28, 64) 640
_________________________________________________________________
batch_normalization_24 (Batc (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu_26 (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
conv2d_23 (Conv2D) (None, 28, 28, 64) 36928
_________________________________________________________________
batch_normalization_25 (Batc (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu_27 (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
conv2d_24 (Conv2D) (None, 28, 28, 64) 36928
_________________________________________________________________
batch_normalization_26 (Batc (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu_28 (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 14, 14, 64) 0
_________________________________________________________________
conv2d_25 (Conv2D) (None, 14, 14, 128) 73856
_________________________________________________________________
batch_normalization_27 (Batc (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_29 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
conv2d_26 (Conv2D) (None, 14, 14, 128) 147584
_________________________________________________________________
batch_normalization_28 (Batc (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_30 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
conv2d_27 (Conv2D) (None, 14, 14, 128) 147584
_________________________________________________________________
batch_normalization_29 (Batc (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_31 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 7, 7, 128) 0
_________________________________________________________________
conv2d_28 (Conv2D) (None, 7, 7, 256) 295168
_________________________________________________________________
batch_normalization_30 (Batc (None, 7, 7, 256) 1024
_________________________________________________________________
leaky_re_lu_32 (LeakyReLU) (None, 7, 7, 256) 0
_________________________________________________________________
conv2d_29 (Conv2D) (None, 7, 7, 256) 590080
_________________________________________________________________
batch_normalization_31 (Batc (None, 7, 7, 256) 1024
_________________________________________________________________
leaky_re_lu_33 (LeakyReLU) (None, 7, 7, 256) 0
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 3, 3, 256) 0
_________________________________________________________________
conv2d_30 (Conv2D) (None, 3, 3, 512) 1180160
_________________________________________________________________
batch_normalization_32 (Batc (None, 3, 3, 512) 2048
_________________________________________________________________
leaky_re_lu_34 (LeakyReLU) (None, 3, 3, 512) 0
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 1, 1, 512) 0
_________________________________________________________________
flatten_3 (Flatten) (None, 512) 0
_________________________________________________________________
dense_7 (Dense) (None, 1024) 525312
_________________________________________________________________
leaky_re_lu_35 (LeakyReLU) (None, 1024) 0
_________________________________________________________________
batch_normalization_33 (Batc (None, 1024) 4096
_________________________________________________________________
dropout_3 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_8 (Dense) (None, 10) 10250
=================================================================
Total params: 3,054,986
Trainable params: 3,049,738
Non-trainable params: 5,248
_________________________________________________________________
๊ฒฐ๊ณผ ๋ฐ ๋๋์
์์ง ๋ชจ๋ธ์ด ๊น์ง ์์ ๊ทธ๋ฐ ๊ฒ ๊ฐ์ง๋ง ์ข์ ๋ชจ๋ธ์ ์ฒ์๋ถํฐ loss ๋จ์ด์ง๋ ๊ฒ ๋ค๋ฅด๋ค. ์ข์ ๋ชจ๋ธ์ผ ์๋ก epoch 10 ์์ชฝ์์ ๋น ๋ฅด๊ฒ Val_Acc๊ฐ ์ข๊ฒ ๋์ ์ดํ๋ฅผ ๊ฐ๋ ํด๋ณผ ์ ์์๋ค. ๋ฌผ๋ก epoch์ 2000์ ๋๋ก ๋๊ณ ๊น๊ฒ ํ์ต์ํจ๋ค๋ฉด ์ข๊ฒ ์ง๋ง ํ์ ๋ ์์์์ ๊ทธ๋๋ง ๋์ ๋ชจ๋ธ์ ๊ณจ๋ผ๋ด๊ธฐ ์ํ ์ต์ ์ด ์๋๊น ์ถ๋ค. ์ด ๋ชจ๋ธ์ ๊ฒฝ์ฐ epoch 8์์ 0.99์ val_acc๋ฅผ ๋ณด์ด๊ณ ์ค๊ฐ ์ค๊ฐ 0.998๋ฅผ ์ํํ๊ธฐ๋ ํ์๋ค.
๊ทธ๋ฆฌ๊ณ ์๊ฒ๋ ๊ฒ์ด keras layer์ ๊ธฐ๋ณธ kernel weight initializer๊ฐ xavier๋ผ๋ ๊ฒ์ ์๊ฒ๋์๋ค. ์ข ๋ GPU ์์์ด ํ๋ฝํ๋ค๋ฉด weight initializer๋ ๋ฐ๊ฟ๋ณด๊ณ batch_size๋ ๋ ๋ค์ํ์ํฌ ์ ์์ง ์์๊น๋ ์์ฌ์์ด ๋จ๋๋ค.
๋ฒ์ธ) Auto Keras
Automl๋ก ์๋์ผ๋ก ๋ชจ๋ธ์ ์ง์ฃผ๋ ์๋์์ ๊ณผ์ฐ ๊ทธ ์ฑ๋ฅ์ ์ด๋จ๊น
In [ ]:
import autokeras as ak
In [ ]:
print(x_train.shape)
print(y_train.shape)
(33600, 28, 28, 1)
(33600, 10)
In [ ]:
# Initialize the ImageClassifier.
clf = ak.ImageClassifier(max_trials=3)
# Search for the best model.
clf.fit(x_train, y_train,
validation_data=(x_val, y_val), # validation set
epochs=10)
Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 9s 9ms/step - loss: 0.1285 - accuracy: 0.9601 - val_loss: 0.0466 - val_accuracy: 0.9875
Epoch 2/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0476 - accuracy: 0.9849 - val_loss: 0.0360 - val_accuracy: 0.9902
Epoch 3/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0365 - accuracy: 0.9892 - val_loss: 0.0389 - val_accuracy: 0.9894
Epoch 4/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0292 - accuracy: 0.9907 - val_loss: 0.0374 - val_accuracy: 0.9907
Epoch 5/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0287 - accuracy: 0.9907 - val_loss: 0.0326 - val_accuracy: 0.9919
Epoch 6/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0249 - accuracy: 0.9922 - val_loss: 0.0316 - val_accuracy: 0.9895
Epoch 7/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0224 - accuracy: 0.9924 - val_loss: 0.0344 - val_accuracy: 0.9919
Epoch 8/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0176 - accuracy: 0.9941 - val_loss: 0.0341 - val_accuracy: 0.9923
Epoch 9/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0179 - accuracy: 0.9935 - val_loss: 0.0312 - val_accuracy: 0.9918
Epoch 10/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0164 - accuracy: 0.9940 - val_loss: 0.0356 - val_accuracy: 0.9919
Trial complete
Trial summary
|-Trial ID: 123c8d89e202f81d1fd46a1f9201f3fe|-Score: 0.03119376050014196|-Best step: 0
Hyperparameters:
|-classification_head_1/dropout_rate: 0.5|-classification_head_1/spatial_reduction_1/reduction_type: flatten|-dense_block_1/dropout_rate: 0|-dense_block_1/num_layers: 1|-dense_block_1/units_0: 128|-dense_block_1/use_batchnorm: False|-image_block_1/augment: False|-image_block_1/block_type: vanilla|-image_block_1/conv_block_1/dropout_rate: 0.25|-image_block_1/conv_block_1/filters_0_0: 32|-image_block_1/conv_block_1/filters_0_1: 64|-image_block_1/conv_block_1/kernel_size: 3|-image_block_1/conv_block_1/max_pooling: True|-image_block_1/conv_block_1/num_blocks: 1|-image_block_1/conv_block_1/num_layers: 2|-image_block_1/conv_block_1/separable: False|-image_block_1/normalize: True|-optimizer: adam
Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 77s 73ms/step - loss: 0.2367 - accuracy: 0.9337 - val_loss: 0.0599 - val_accuracy: 0.9837
Epoch 2/10
1050/1050 [==============================] - 69s 66ms/step - loss: 0.0999 - accuracy: 0.9753 - val_loss: 0.0854 - val_accuracy: 0.9829
Epoch 3/10
1050/1050 [==============================] - 69s 66ms/step - loss: 0.0586 - accuracy: 0.9848 - val_loss: 1.5777 - val_accuracy: 0.8914
Epoch 4/10
1050/1050 [==============================] - 71s 68ms/step - loss: 0.0444 - accuracy: 0.9887 - val_loss: 0.0594 - val_accuracy: 0.9870
Epoch 5/10
1050/1050 [==============================] - 71s 67ms/step - loss: 0.0477 - accuracy: 0.9876 - val_loss: 0.0479 - val_accuracy: 0.9887
Epoch 6/10
1050/1050 [==============================] - 69s 66ms/step - loss: 0.0505 - accuracy: 0.9888 - val_loss: 10.7493 - val_accuracy: 0.6519
Epoch 7/10
1050/1050 [==============================] - 68s 65ms/step - loss: 0.0536 - accuracy: 0.9868 - val_loss: 0.0613 - val_accuracy: 0.9869
Epoch 8/10
1050/1050 [==============================] - 68s 65ms/step - loss: 0.0280 - accuracy: 0.9922 - val_loss: 0.0520 - val_accuracy: 0.9874
Epoch 9/10
1050/1050 [==============================] - 73s 70ms/step - loss: 0.0305 - accuracy: 0.9926 - val_loss: 0.0414 - val_accuracy: 0.9904
Epoch 10/10
1050/1050 [==============================] - 72s 68ms/step - loss: 0.0369 - accuracy: 0.9911 - val_loss: 0.0493 - val_accuracy: 0.9902
Trial complete
Trial summary
|-Trial ID: 03db11dd05b1734a2cf3413c1ac7e197|-Score: 0.04135792684210588|-Best step: 0
Hyperparameters:
|-classification_head_1/dropout_rate: 0|-dense_block_1/dropout_rate: 0|-dense_block_1/num_layers: 2|-dense_block_1/units_0: 32|-dense_block_1/units_1: 32|-dense_block_1/use_batchnorm: False|-image_block_1/augment: True|-image_block_1/block_type: resnet|-image_block_1/normalize: True|-image_block_1/res_net_block_1/conv3_depth: 4|-image_block_1/res_net_block_1/conv4_depth: 6|-image_block_1/res_net_block_1/pooling: avg|-image_block_1/res_net_block_1/version: v2|-optimizer: adam
Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 7s 7ms/step - loss: 0.1271 - accuracy: 0.9611 - val_loss: 0.0507 - val_accuracy: 0.9870
Epoch 2/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0479 - accuracy: 0.9854 - val_loss: 0.0386 - val_accuracy: 0.9898
Epoch 3/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0371 - accuracy: 0.9886 - val_loss: 0.0337 - val_accuracy: 0.9911
Epoch 4/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0294 - accuracy: 0.9906 - val_loss: 0.0340 - val_accuracy: 0.9906
Epoch 5/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0257 - accuracy: 0.9915 - val_loss: 0.0327 - val_accuracy: 0.9914
Epoch 6/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0239 - accuracy: 0.9920 - val_loss: 0.0331 - val_accuracy: 0.9910
Epoch 7/10
1050/1050 [==============================] - 7s 7ms/step - loss: 0.0215 - accuracy: 0.9928 - val_loss: 0.0311 - val_accuracy: 0.9920
Epoch 8/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0192 - accuracy: 0.9934 - val_loss: 0.0313 - val_accuracy: 0.9918
Epoch 9/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0169 - accuracy: 0.9942 - val_loss: 0.0300 - val_accuracy: 0.9924
Epoch 10/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0166 - accuracy: 0.9945 - val_loss: 0.0324 - val_accuracy: 0.9918
Trial complete
Trial summary
|-Trial ID: 2ce4926fd3ec015466417c00c29b3ca4|-Score: 0.029994319529753708|-Best step: 0
Hyperparameters:
|-classification_head_1/dropout_rate: 0.5|-classification_head_1/spatial_reduction_1/reduction_type: flatten|-dense_block_1/dropout_rate: 0|-dense_block_1/num_layers: 1|-dense_block_1/units_0: 128|-dense_block_1/use_batchnorm: False|-image_block_1/augment: False|-image_block_1/block_type: vanilla|-image_block_1/conv_block_1/dropout_rate: 0.25|-image_block_1/conv_block_1/filters_0_0: 32|-image_block_1/conv_block_1/filters_0_1: 64|-image_block_1/conv_block_1/kernel_size: 3|-image_block_1/conv_block_1/max_pooling: True|-image_block_1/conv_block_1/num_blocks: 1|-image_block_1/conv_block_1/num_layers: 2|-image_block_1/conv_block_1/separable: False|-image_block_1/normalize: True|-optimizer: adam
INFO:tensorflow:Oracle triggered exit
Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 7s 7ms/step - loss: 0.1359 - accuracy: 0.9585 - val_loss: 0.0513 - val_accuracy: 0.9870
Epoch 2/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0482 - accuracy: 0.9852 - val_loss: 0.0372 - val_accuracy: 0.9899
Epoch 3/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0360 - accuracy: 0.9888 - val_loss: 0.0364 - val_accuracy: 0.9906
Epoch 4/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0303 - accuracy: 0.9905 - val_loss: 0.0340 - val_accuracy: 0.9906
Epoch 5/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0261 - accuracy: 0.9918 - val_loss: 0.0327 - val_accuracy: 0.9917
Epoch 6/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0234 - accuracy: 0.9923 - val_loss: 0.0310 - val_accuracy: 0.9924
Epoch 7/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0212 - accuracy: 0.9934 - val_loss: 0.0334 - val_accuracy: 0.9920
Epoch 8/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0192 - accuracy: 0.9937 - val_loss: 0.0354 - val_accuracy: 0.9919
Epoch 9/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0160 - accuracy: 0.9947 - val_loss: 0.0331 - val_accuracy: 0.9918
Epoch 10/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0160 - accuracy: 0.9945 - val_loss: 0.0416 - val_accuracy: 0.9912
์์ฑ๋ ๋ชจ๋ธ Summary ๋ฐ Sumbit
In [ ]:
model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
normalization (Normalization (None, 28, 28, 1) 3
_________________________________________________________________
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 24, 64) 18496
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 12, 12, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 9216) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 9216) 0
_________________________________________________________________
dense (Dense) (None, 10) 92170
_________________________________________________________________
classification_head_1 (Softm (None, 10) 0
=================================================================
Total params: 110,989
Trainable params: 110,986
Non-trainable params: 3
_________________________________________________________________
์ ๊ธฐํ๊ฒ๋ ๊ฝค ๋น์ทํ ๋ชจ๋ธ์ ์งฐ๋ค. ์์ฌ ๋จ์ํ๊ณ params๊ฐ 11๋ง ๋ฐ์ ์๋์ง๋ง ์ฑ๋ฅ์ Public Dashboard ๊ธฐ์ค 0.9911์ด ๋์๋ค.In [ ]:
# Evaluate on the testing data.
print('Accuracy: {accuracy}'.format(
accuracy=clf.evaluate(x_val, y_val)))
263/263 [==============================] - 1s 4ms/step - loss: 0.0416 - accuracy: 0.9912
Accuracy: [0.04157625054905844, 0.9911905]
In [ ]:
model = clf.export_model() # Auto Keras๋ก ์์ฑ๋ ๋ชจ๋ธ Export
save_and_submit(model, "autokeras.csv", "AutoKeras_iter3")
Validation ํ์ต
In [ ]:
model.fit(x_val, y_val, epochs=5, batch_size=1024, verbose=3)
save_and_submit(model, "submit.csv", "AK+val")
Validation Set๊น์ง ํ์ต์ํค๋ 0.993 ์ผ๋ก ์ฑ๋ฅ์ด ํฅ์๋์๋ค.
Last updated
Was this helpful?