๊ณผ์ ๋ด์ฉ ์ค๋ช
์บ๊ธ Kannada MNIST๋ฅผ ์ด์ฉํ ๋ฏธ๋๋ํ
์ฐ์๊ณผ์ ์ ์ ์ด์
์กฐ์์ฐ๋์ ์๋ฒฝํ ๋
ธํธ๋ถ์ด์์ต๋๋ค. keras sklearn wrapper๋ฅผ ์ด์ฉํด์ ๊ทธ๋ฆฌ๋ ์์น ํ์ดํผ ํ๋ผ๋ฏธํฐ ํ๋์ ์งํํ์ ์ , hiplot์ด๋ผ๋ ๋ชจ๋๋ก ๊ฐ ๋ ์ด์ด๋ณ ์๊ฐํ ์งํํ์๊ณ ์ด๋ฅผ ํตํด ์ฑ๋ฅ์ด ๋์ ๋ฐฐ์น์ ๊ทํ, ๋๋กญ์์ ๋ฑ์ ๋ถ์ํ์ ์ , ๋ํ SOPCNN์ด๋ผ๋ MNIST SOTA ๋ชจ๋ธ์ ์ฐพ์ ๋ณด์ ์ , Autokeras ์คํ๊น์ง ๋ฐฐ์ธ๊ฒ ๋ง์ ์ต๊ณ ์ ๋
ธํธ๋ถ์ด์์ต๋๋ค.
7์ฃผ์ฐจ: Deep Learning Framework
13๊ธฐ ์กฐ์์ฐ
๋ฐ์ดํฐ: Train: 42,000 rows, Test: 18,000 rows
๋ชฉ์ฐจ
์ฌ๋ฌ ๋ฅ๋ฌ๋ ์ฌํ ๊ธฐ๋ฒ ๋น๊ต (with Hiplot)
Activation
Relu / Leaky Relu / PRelu
๋ชจ๋ธ ๋ฆฌ์์น ๋ฐ ์ ์
ํ์ต ๋ฐ ํ์ดํผ ํ๋ผ๋ฏธํฐ ์กฐ์
0. Pre-requisite & Module Import
In [ ]:
Copy !pip install tensorflow-gpu==2.1.0 # Keras Data Augmentation ํด ์ด์ฉ์ ํ์, ๊ทธ๋ฅ tf 2.0์ผ๋ก ํ๋ฉด ์๋ฌ
# !pip install autokeras
# !pip install hiplot
# !pip install kaggle
In [ ]:
Copy from __future__ import absolute_import, division, print_function, unicode_literals
try:
%tensorflow_version 2.x # Colab์์ ํ
์ํ๋ก์ฐ 2.0์ ์ฌ์ฉํ ์ ์๋ Magic Method
except Exception:
pass
import tensorflow as tf
import os
import tensorflow_datasets as tfds
Copy TensorFlow 2.x selected.
In [ ]:
Copy %load_ext tensorboard
In [8]:
Copy import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, Dropout, Dense, Flatten, BatchNormalization, MaxPooling2D, LeakyReLU, ReLU, PReLU
from tensorflow.keras.optimizers import RMSprop, Nadam, Adadelta, Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.regularizers import l2
In [ ]:
In [ ]:
Copy # Kaggle API Key to Root
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
In [ ]:
Copy import os
def save_and_submit(model, filename, description):
"""
CSVํ์ผ ์ ์ฅ ํ Kaggle๋ก ์ ์ก
"""
if "predict_classes" in dir(model):
predict = model.predict_classes(X_test)
else:
# Predict_classes ํจ์๊ฐ ์๋ ๊ฒฝ์ฐ predict ํ argmax ๋ก predict ์์ฑ
predict = np.argmax(model.predict(X_test), axis=1)
sample_submission['Category'] = predict
sample_submission.to_csv(filename,index=False)
# system ํจ์๋ก kaggle ๋ช
๋ น์ด ์คํ
os.system(f"kaggle competitions submit -c tobigs13nn -f {filename} -m {description}")
In [3]:
Copy import glob
glob.glob("*")
Out[3]:
Copy ['w7_pytorch.ipynb',
'test_df.csv',
'Untitled.ipynb',
'FirePytorch.ipynb',
'w7_DL_Framework.ipynb',
'FireKeras.ipynb',
'tutorial-for-everybody.ipynb',
'grid_result.json',
'train_df.csv',
'sample_submission.csv']
In [ ]:
Copy # GPU ์
ํ
ํ์ธ
tf.test.gpu_device_name()
Out[ ]:
1. Data Load & Preprocessing
In [ ]:
Copy sample_submission = pd.read_csv("sample_submission.csv")
test = pd.read_csv("test_df.csv")
train = pd.read_csv("train_df.csv")
print(f"Train data shape {train.shape}")
print(f"Test data shape {test.shape}")
X = train.iloc[:,1:].values
y = train.iloc[:,0].values
X_test = test.iloc[:,1:].values
X = X / 255
X_test = X_test / 255
# ์ด๋ฏธ์ง๋ก ์ฒ๋ฆฌํ๊ธฐ ์ํด ๋ณํ
X = X.reshape(-1, 28, 28,1)
X_test = X_test.reshape(-1, 28, 28,1)
y = to_categorical(y)
Copy Train data shape (42000, 785)
Test data shape (18000, 785)
In [ ]:
Copy from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state=42)
In [ ]:
Copy # ๋ฐ์ดํฐ ํฌ๊ธฐ ํ์ธ
x_train.shape, x_val.shape, X_test.shape
Out[ ]:
Copy ((33600, 28, 28, 1), (8400, 28, 28, 1), (18000, 28, 28, 1))
2. ๋ฅ๋ฌ๋ ์ฌํ ๊ธฐ๋ฒ Grid Search (with Hiplot)
๋ค์ํ ๊ธฐ๋ฒ๊ณผ ํ์ดํผ ํ๋ผ๋ฏธํฐ ํ๋์ ํตํด ๊ฐ ๊ธฐ๋ฒ ๋ค์ ํจ๊ณผ๋ฅผ ์ ๋ฆฌ
Copy param_grid = {
'batch_size': [512, 1024, 2048],
'epochs': [20],
'_optimizer': ['RMSprop','Adam'],
'_lr': [1e-3, 2e-3, 1e-2],
'_batch_norm': [1, 0],
'_activation': ['relu'],
'_dropout': [0.2, 0.4],
}
์ด๋ฒ์ ๋ฐฐ์ด ์ฌ๋ฌ๊ฐ์ง ๊ธฐ๋ฒ๋ค์ ์ /๋ฌด,ํฌ๊ธฐ ์กฐ์ ๋ฑ์ ํตํด ๊ทธ ํจ๊ณผ๋ฅผ ๋ถ์ํ๊ณ ๊ฐ์ฅ ์ต์ ์ ํ๋ผ๋ฏธํฐ๋ฅผ ๊ตฌํด๋ณธ๋ค.
ํ๋ผ๋ฏธํฐ๋ณ๋ก ์ต์ 3๊ฐ์์ 5๊ฐ๊น์ง ์ต์
์ ์ฃผ๊ณ ์ถ์์ง๋ง Colab ๋ฉ๋ชจ๋ฆฌ๊ฐ ๊ฐ๋นํ์ง ๋ชปํ๊ณ ๊ณ์ ํฐ์ ธ์ ๊ฐ๋ฅํ ์์ค์ผ๋ก ๋ฎ์ถ์๋ค.
2.1 Grid Search ์ค๋น ๋ฐ ๋ชจ๋ธ ์์ฑ
In [ ]:
Copy # ์ด์ ์ ๋ฐฐ์ด scikit-learn์ GridSearchCV์ KerasClassifier ์ด์ฉ
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
In [ ]:
Copy # Function to create model, required for KerasClassifier
def create_model(_optimizer, _lr, _batch_norm, _activation, _dropout):
"""
๊ธฐ๋ฒ ๊ฐ ๋น๊ต๋ฅผ ์ํ ๋ชจ๋ธ ์์ฑ (FC 3-Layer)
"""
model = tf.keras.models.Sequential()
model.add(Dense(64, input_shape=(784,)))
model.add(BatchNormalization()) if _batch_norm else '' # batch_norm ์ /๋ฌด ๊ฒฐ์
model.add(Dropout(_dropout))
model.add(Dense(128, activation=_activation))
model.add(BatchNormalization()) if _batch_norm else '' # batch_norm ์ /๋ฌด ๊ฒฐ์
model.add(Dropout(_dropout))
model.add(Dense(256, activation=_activation))
model.add(BatchNormalization()) if _batch_norm else '' # batch_norm ์ /๋ฌด ๊ฒฐ์
model.add(Dropout(_dropout))
model.add(Dense(10, activation='softmax'))
optimizer = getattr(tf.keras.optimizers, _optimizer)(learning_rate=_lr)
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
return model
# KerasClassifier๋ก ๋ชจ๋ธ ์์ฑ๊ธฐ ์ค์
model = KerasClassifier(build_fn=create_model, verbose=3)
In [ ]:
Copy # Grid ์ค์
param_grid = {
'batch_size': [1024, 2048],
'epochs': [15, 30],
'_optimizer': ['RMSprop','Adam'],
'_lr': [1e-3, 2e-3, 1e-2],
'_batch_norm': [1, 0],
'_activation': ['relu'],
'_dropout': [0.2, 0.4],
}
In [ ]:
Out[ ]:
Copy {'batch_size': [2048],
'epochs': [15, 30],
'_optimizer': ['RMSprop', 'Adam'],
'_lr': [0.001, 0.01],
'_batch_norm': [True, False],
'_activation': ['relu'],
'_dropout': [0.2, 0.4]}
In [ ]:
Copy grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
grid_result = grid.fit(X, y) #Grid Search ํ์ต
grid_result
Out[ ]:
Copy GridSearchCV(cv=3, error_score=nan,
estimator=<tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x7f61b1e6b6d8>,
iid='deprecated', n_jobs=None,
param_grid={'_activation': ['relu'], '_batch_norm': [True, False],
'_dropout': [0.2, 0.4], '_lr': [0.001, 0.01],
'_optimizer': ['RMSprop', 'Adam'],
'batch_size': [2048], 'epochs': [15, 30]},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring=None, verbose=0)
In [ ]:
Copy grid_result.cv_results_['rank_test_score'] # ํ
์คํธ Score ์์
grid_result.cv_results_['mean_test_score'] # ํ
์คํธ Score ํ๊ท
grid_result.cv_results_['std_test_score'] # ํ
์คํธ Score std
Out[ ]:
Copy array([0.00191339, 0.00231553, 0.00210279, 0.0021028 , 0.00749656,
0.00113289, 0.00187718, 0.00275614, 0.0021762 , 0.00204734,
0.00289577, 0.00242509, 0.0027911 , 0.00478707, 0.00220003,
0.00271803, 0.00134518, 0.00207868, 0.00301714, 0.00154302,
0.00933254, 0.00683188, 0.00201384, 0.0019293 , 0.00253702,
0.00236972, 0.00268085, 0.00193342, 0.00889858, 0.01590848,
0.0020548 , 0.00267833])
In [ ]:
Copy # ์๊ฐํ๋ฅผ ์ํ ์ ์ฒ๋ฆฌ
grid_params = grid_result.cv_results_['params']
for i in range(len(grid_params)):
for key in ['mean_test_score', 'std_test_score', 'mean_fit_time']:
grid_params[i][key] = grid_result.cv_results_[key][i]
In [ ]:
In [ ]:
Copy # grid_params ๊ฒฐ๊ณผ ์ ์ฅ
with open('grid_result.json','w') as f:
f.write(json.dumps(grid_params))
In [11]:
Copy # grid_params ๊ฒฐ๊ณผ ๋ถ๋ฌ์ค๊ธฐ
with open('grid_result2.json','r') as f:
grid_params = json.loads(f.read())
In [28]:
2.2 ๊ฒฐ๊ณผ ์๊ฐํ
์๊ฐํ๋ก Facebook Research์ Hiplot( https://github.com/facebookresearch/hiplot )์ ์ฌ์ฉํ์๋ค.
ํนํ ๊ณ ์ฐจ์์ ๋ฐ์ดํฐ ๊ฐ์ ํจํด์ ๋ณด๊ธฐ์ ์ฉ์ดํ๋ค๊ณ ํ๋ค.
In [12]:
Copy import hiplot as hip
hip.Experiment.from_iterable(grid_params).display()
Copy <IPython.core.display.Javascript object>
Out[12]:
Copy <hiplot.ipython.IPythonExperimentDisplayed at 0x114de8bd0>
์ ์ฒด ๊ทธ๋ํ
mean_score, Std_score ์์ ๋ชจ๋ธ ๊ทธ๋ํ
์ธ์ฌ์ดํธ
ํ์คํ ์ต์์๊ถ์ ๋ชจ๋ธ์ BatchNorm์ด ๊ฑฐ์ ๋ชจ๋ ์ ์ฉ๋์ด ์๋ค.
Dropout์ 0.2์์ ์ข์ ์ฑ๋ฅ ์ ๋ณด์๋ค.
Lr์ 0.01์ด ์ข๊ฒ ๋์์ง๋ง, ์ด๋ Epoch์ด 20 ๋ฐ์ ์๋์ด ์์ง ํ์ต์ด ์งํ ์ค์ผ ๊ฐ๋ฅ์ฑ์ด ํฌ๋ค.
Batch Size๋ ๋ฎ์ ์๋ก ์ข์ ์ฑ๋ฅ์ด ๋์๋ค. ์์ ๋น๋ก๋ผ ๋ณด๊ธด ํ๋ค๊ฒ ์ง๋ง ๋
ผ๋ฌธ์์๋ 256์ ์ ์ฉํ ๊ฒ์ ๋ณด๋ฉด ๋๋ฌด ํฌ๋ฉด ๊ฐ Batch์ ํน์ฑ์ด ๋ญ๊ฐ์ง๋๊ฒ ์๋๊น ์ถ์ธก๋๋ค.
Optimizer๋ Adam๊ณผ RMSprop ๋ชจ๋ ๊ด์ฐฎ์ ์ฑ๋ฅ์ด ๋์๋ค.
3. Model Research & Selection
MNIST ๋ฐ์ดํฐ์
์ ๋ํ์ ์ธ ์ด๋ฏธ์ง ๋ถ๋ฅ ๋ฐ์ดํฐ์
์ผ๋ก ๋ง์ ๋ฅ๋ฌ๋ ๋ชจ๋ธ์ด SOTA๊ธ ์ฑ๋ฅ์ ๋ณด์ฌ์ฃผ๊ณ ์๋ค.
๊ทธ์ค์์ ํนํ CNN ๋ชจ๋ธ์ด ๋น ๋ฅธ ํ์ต ์๋์ ์๋ฑํ ์ฑ๋ฅ์ ๋ณด์ฌ์ฃผ๊ณ ์์ผ๋ฉฐ ์ด๋ฒ ๊ณผ์ ์ ์ ์ ์ธ Kannada MNIST์์๋ ๋ง์ ์ฌ๋๋ค์ด ํด๋น ๋ชจ๋ธ์ ํตํด ์ข์ ์ฑ์ ์ ๊ฑฐ๋์๋ค. ์ด๋ฅผ ์ฐธ๊ณ ํ์ฌ ๋ชจ๋ธ์ ์ค๊ณํ๊ณ , Papers with code ์์ SOTA๊ธ ๋
ผ๋ฌธ ๋ค์ ์ฐธ๊ณ ํ์ฌ ์ฌ๋ฌ ๊ธฐ๋ฒ ๋ค์ ์ฅ๋จ์ ์ ์ฐธ๊ณ ํ์ฌ ์ ์ฉํด๋ณธ๋ค. ๊ทธ๋ฆฌ๊ณ ์ค์ ๋ก ๊ฝค ์ข์ ์ธ์ฌ์ดํธ๋ฅผ ์ป์ ์ ์์๋ค.
3.1 SOPCNN
Stochastic Optimization of Plain Convolutional Neural Networks with Simple methods
2020 MNIST SOTA
MNIST 2020๋
SOTA์ธ SOPCNN ๋
ผ๋ฌธ์ ๋ณด๋ฉด, ์ต์ ํ ๊ธฐ๋ฒ์ ์๋นํ ๊ณต์ ๋ค์์์ ์ ์ ์๊ณ ํนํ ์ด๋ฒ ๋ด์ฉ๊ณผ ๊ฒน์น๋ ๋ถ๋ถ์ด ๊ฐ์ ธ์ ์ ์ฉํด๋ณด๋ฉด ์ข์ ์ ์ด ๋ง์๋ค. ์ด ๋
ผ๋ฌธ์ด ์งํฅํ๋ ๋ฐ๋ CNN๋ชจ๋ธ์ด ์ข์ ์ฑ๋ฅ์ ๋ณด์ด์ง๋ง epoch์ด ์ฆ๊ฐํจ์ ๋ฐ๋ฅธ overfitting ๋ฌธ์ ๊ฐ ์ฌํ์ฌ ์ด๋ฅผ ์ด๋ป๊ฒ ์ ์ต์ ํํ ์ง๋ก Data Augmentation๊ณผ ํนํ Dropout์ ์ฃผ๋ก ๋ค๋ฃจ๊ณ ์๋ค.
Architecture and Design
๊ธฐ๋ณธ์ ๋ชจ๋ธ ๊ตฌ์ฑ์ SimpleNet ์ ๊ตฌ์ฑ์ ๋ฐ๋ฅด๊ณ ์๋ค๊ณ ํ๋ค. MNIST ๋ชจ๋ธ์ ์๋ฅผ ๋ค๋ฉด, ์ด 4๊ฐ์ Conv2D layer ๊ฐ ์๊ณ 2๊ฐ ๋ง๋ค Max Pooling Layer๊ฐ ๋ถ๋๋ค. ๋ค์ด์ด 2๊ฐ์ Fully Connected Layer, ๋ง์ง๋ง์ Softmax Activation Layer๋ฅผ ๋ถ์ฌ ๋ชจ๋ธ์ ์์ฑ์์ผฐ๋ค. ์ฌ๊ธฐ์ ํ์ต๋ฅ ์ 0.01๋ก ์ฃผ์๊ณ Dropout ์์น ๋ฐ FC Layer์ ํฌ๊ธฐ๋ ํ๋ผ๋ฏธํฐ ์กฐ์ ์ ํตํด ๊ฒฐ์ ํ์๋ค.
๊ฐ์ฅ ์ธ์๊น์๋ ์
Dropout์ Softmax ์ง์ ์ ํ๋๋ง ์๋ ๊ฒ์ด ๊ฐ์ฅ ์ข๋ค :Maxpool ๋ค์ ๋ฐฐ์นํ๊ธฐ๋ ํ๊ณ , Spatial Dropout๋ ์ ์ฉํด๋ณด์์ง๋ง ๊ทธ๋ฅ Regular Dropout์ FC ๋ค์์ ๋ฐฐ์นํ ๊ฒ์ด ๊ฐ์ฅ ์ฑ๋ฅ์ด ์ข์๋ค๊ณ ํ๋ค.
FC Layer 2048, Drop rate 0.8์ด ๊ฐ์ฅ ์ข๋ค : ์ ์ด ๋
ผ๋ฌธ ์ ๋ชฉ์์ Stochastic์ด๋ ๋ง์ ์ผ๋์ง ์ ์ ์๋ ๋๋ชฉ์ด๋ค. ๋ฏฟ๊ธฐ์ง ์๋ ๋๋กญ์จ์ด๋ผ 5๋ฒ์ ๋ฐ๋ณต ์คํ์ ํตํด ํ๊ท 0.18%์ ์๋ฌ์จ์ด ๋์จ๋ค๋ ๊ฒ์ ์
์ฆํ์๋ค.
๊ทธ ์ธ์๋ Data Augmentation ์
ํ
๋ฑ์ ์๊ฐํ๊ณ ์์ด ํ์ต์ ์ฐธ๊ณ ํ ์ ์์๋ค.
3.1.1 ๊ตฌํ
In [9]:
Copy model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3,3), padding='same', input_shape=(28, 28, 1)),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(2048),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Dense(2048),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Dropout(0.8),
tf.keras.layers.Dense(10, activation='softmax')
])
Copy WARNING:tensorflow:Large dropout rate: 0.8 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
In [10]:
Copy Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 64) 640
_________________________________________________________________
batch_normalization (BatchNo (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 28, 28, 64) 36928
_________________________________________________________________
batch_normalization_1 (Batch (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 14, 14, 128) 73856
_________________________________________________________________
batch_normalization_2 (Batch (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 14, 14, 128) 147584
_________________________________________________________________
batch_normalization_3 (Batch (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 6272) 0
_________________________________________________________________
dense (Dense) (None, 2048) 12847104
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU) (None, 2048) 0
_________________________________________________________________
dense_1 (Dense) (None, 2048) 4196352
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU) (None, 2048) 0
_________________________________________________________________
dropout (Dropout) (None, 2048) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 20490
=================================================================
Total params: 17,324,490
Trainable params: 17,323,722
Non-trainable params: 768
_________________________________________________________________
ํ
์ํ๋ก์ฐ๊ฐ ์น์ ํ๊ฒ Drop rate๊ฐ 0.5๊ฐ ๋์ผ๋ ํน์ ๊ณผ๊ฑฐ์์ ์ค์ ๋ถ์ธ์ง ์ฌ์ญ๊ณ ์์ง๋ง ๋ฌด์ํ๊ณ ํ์ต์ ์งํํด๋ณธ๋ค.
In [ ]:
Copy optimizer = Adam(learning_rate=0.01) # ๋
ผ๋ฌธ ์ค์ ๋๋ก 0.01์ ์ฃผ์๋ค.
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
3.1.2 ํ์ต ๋ฐ ๊ฒฐ๊ณผ
์ค์ Colab์์ ๋๋ ค๋ณธ ๊ฒฐ๊ณผ ํ์ต์ด ๋๋ฌด ์๋์๋ค. Val_Acc๊ฐ 0.98์ ์ ๊ทผ์กฐ์ฐจ ๋ชปํ๊ณ ๋ชจ๋ธ์ด ์ ํ Simpleํ์ง ์์ ํ์ต์๋ ์ค๋์๊ฐ์ด ๊ฑธ๋ ธ๋ค.
๊ทธ ์ดํ๋ก FC์ ํ์ดํผ ํ๋ผ๋ฏธํฐ 2048๋ฅผ 1024๋ก, Drop Rate๋ฅผ 0.6, 0.4๋ก ๊ฐ๊ธฐ ์คํํด๋ณด์์ ๋ Val Acc 99.6๋ก ๋น์ทํ๊ฒ ๋์ ์ฑ๋ฅ์ด ๋์๋ค.
์ฌ๊ธฐ์ ๋ฐ์ดํฐ ํฌ๊ธฐ๋ ๋ถ๋ฅํ ๊ฐฏ์๋ก FC ํ๋ผ๋ฏธํฐ๋ฅผ ์ข ๋ ์๊ฒ ์กฐ์ ํ ํ์ ๊ฐ ์๋ค๋ ๊ฒ์ ์ ์ ์์์ผ๋ฉฐ, ๋
ผ๋ฌธ์์ Epoch์ 2000๊น์ง ์งํํ๋๋ฐ ์ง๊ธ Colab์์ ๊ทธ๊ฑด ํ๋ค๊ธฐ ๋๋ฌธ์ Drop rate์์ ์ด๋์ ๋ ํํ์ ๋ณด์ ๋น ๋ฅธ ํ์ต์ ์งํ ํด์ผ ๊ฒ ๋ค๋ ๋ฐฉํฅ์ฑ์ ์ธ์ธ ์ ์์๋ค.
3.2 CNN (VGG + Data Augmentation)
https://www.kaggle.com/benanakca/kannada-mnist-cnn-tutorial-with-app-top-2
https://www.kaggle.com/c/Kannada-MNIST ์์ ์์ 2%์ ์ฑ๋ฅ์ด ๋์จ ๋ชจ๋ธ์ ์ฐพ์ ์ ์์๊ณ ์น์ ํ๊ฒ ์ฌ๋ฌ ๊ธฐ๋ฒ๋ค์ ์๊ฐํ๊ณ ์์๋ค.
๊ทธ ์ค์์ ํนํ ImageDataGenerator์ ReduceLROnPlateau์ด ์ธ์์ ์ด์๋๋ฐ ์ ์๋ ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ฅผ ๋๋ค์ผ๋ก ๋ณํ์์ผ ๊ธฐ์กด ๋ฐ์ดํฐ์ Overfitting๋๋ ๊ฒ์ ๋ฐฉ์งํ๊ธฐ ์ํ Data Augmentationํด๋ก tf.keras์์ ๋ถ๋ฌ์ฌ ์ ์๋ค. ํ์๋ ๊ทธ ๋ป๋๋ก ์์ ๋๋ฉด ํ์ต๋ฅ ์ ๋ฎ์ถฐ์ฃผ๋ ์ฝ๋ฐฑํจ์๋ก ํ์ต ์ค์ ์งํ๋ฅผ ๊ณ์ ๋ชจ๋ํฐ๋ง ํ์ฌ ์ผ์ ์์ค ์ด์ ์์ ์ด ๋๋ฉด factor * lr๋ก ํ์ฌ์ ํ์ต๋ฅ ์ ์์ฐจ์ ์ผ๋ก ๋ฎ์ถ์ด min_lr์ ๊ทผ์ ํ๋๋ก ํ๋ค.
์ฃผ์ํ ์ ์ ImageDataGenerator์ ํ๋ผ๋ฏธํฐ๋ฅผ ์์งํ์ฌ ํน์ ๋ชจ๋ฅผ ์ค์๋ฅผ ๋ฐฉ์งํด์ผํ๋ ๋ฐ, ํนํ Mnist์ ๊ฒฝ์ฐ flip์ด ์ผ์ด๋์ ์๋๋ฉฐ cutout๋ ์ง์ํ๋ค.
์๋ ํ๋ SOPCNN์์ ์งํํ Data Augmentation ์ด๋ค.
In [ ]:
Copy datagen_train = ImageDataGenerator(rotation_range = 10,
# 360๋ ๊ธฐ์ค์ผ๋ก ์ ์ํ์ ๋ฃ์ด์ผํ๋ค. 10 -> 10๋์์์ ํ์
width_shift_range = 0.25,
# 1์ ๊ธฐ์ค์ผ๋ก 0.25๋งํผ ๊ฐ๋ก ์ด๋, 1๋ณด๋ค ํฌ๋ค๋ฉด ์ด๋ ํฝ์
์๋ก ๋ณํ
height_shift_range = 0.25,
# ์์ ๊ฐ์
shear_range = 0.1,
# ํ์ด์ง ์ ๋
zoom_range = 0.4,
# ํ๋ ์ ๋, ์ด ๊ฒฝ์ฐ [์ต์:0.6, ์ต๋:1.4] ์ ์๋ฏธํ๋ค. [0.7,1] ์ด๋ฐ ์๋ ๊ฐ๋ฅ
horizontal_flip = False)
# ์ํ ๋ค์ง๊ธฐ False๋ก ๋ฐฉ์ง, ํ์ง๋ง ์ด๋ฏธ default๊ฐ False๋ผ ๊ตณ์ด ํ ํ์๋ ์๋ค.
datagen_val = ImageDataGenerator()
learning_rate_reduction = tf.keras.callbacks.ReduceLROnPlateau(
monitor='loss',
# Quantity to be monitored.
factor=0.25,
# Factor by which the learning rate will be reduced. new_lr = lr * factor
patience=2,
# The number of epochs with no improvement after which learning rate will be reduced.
verbose=1,
# 0: quiet - 1: update messages.
mode="auto",
# {auto, min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing;
# in the max mode it will be reduced when the quantity monitored has stopped increasing;
# in auto mode, the direction is automatically inferred from the name of the monitored quantity.
min_delta=0.0001,
# threshold for measuring the new optimum, to only focus on significant changes.
cooldown=0,
# number of epochs to wait before resuming normal operation after learning rate (lr) has been reduced.
min_lr=0.00001
# lower bound on the learning rate.
)
3.2.1 ๋ชจ๋ธ๋ง
VGG์ ์ ์ฌํ ๋ชจํ์ ํ๊ณ ์์ผ๋ฉฐ Conv2D(512) Layer๋ ์ ๊ฑฐํ ํ flattenํ FC(256)๋ง ์ฃผ์๋ค๋ ๊ฒ ํน์ง์ด๋ค.
์๋์ธต์ ๋ชจ๋ LeakyReLU๋ฅผ ์ฌ์ฉํ์๋ค.
In [ ]:
Copy model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3,3), padding='same', input_shape=(28, 28, 1)),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(256, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(256, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),##
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dense(10, activation='softmax')
])
optimizer = RMSprop(learning_rate=0.002,
rho=0.9,
momentum=0.1,
epsilon=1e-07,
centered=True,
name='RMSprop')
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
model.summary()
In [ ]:
Copy es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=300, restore_best_weights=True)
history = model.fit(datagen_train.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=len(x_train)//batch_size,
epochs=epochs,
validation_data=(x_val, y_val),
callbacks=[learning_rate_reduction, es],
verbose=2)
Colab์์ Epoch 40๋งํผ ํ์ต์ํจ ๊ฒฐ๊ณผ Acc 99.74๋ ๊ฒฐ๊ณผ๊ฐ ๋์๋ค. ๋งค์ฐ ์ข์ ๊ฒฐ๊ณผ๋ผ ์ด ๋ชจ๋ธ๊ณผ ์ฌ๋ฌ ๊ธฐ๋ฒ์ ์ ์ฉํ์ฌ ํ์ดํผ ํ๋ผ๋ฏธํฐ ํ๋์ ์งํํ์๋ค.
4. ํ์ต ๋ฐ ํ์ดํผ ํ๋ผ๋ฏธํฐ ์กฐ์
VGG ๋ชจ๋ธ์ ๋ฐํ์ ๋๊ณ ์ฌ๋ฌ ์ต์ ํ ๊ธฐ๋ฒ์ ์ ์ฉํ์๋ค. Conv๋ ์ด์ด ์ถ๊ฐ, Relu๋ก ๋ฐ๊พธ๊ธฐ, FC Layer ์กฐ์ , Dropout ์กฐ์ ๋ฑ์ ํด๋ณด์๋ค.
Pytorch์์ Transformer ๋ชจ๋ธ๋ ์คํํด๋ณด์์ง๋ง ์ฑ๋ฅ์ด ๋ณ๋ก ์ข์ง ์์๋ค. (0.96)
๊ฒฐ๊ณผ์ ์ผ๋ก SOPCNN์ VGG๋ฅผ ์ ์ ํ ํผํฉํ ๋ชจ๋ธ ์ด ๊ฐ์ฅ ์ฑ๋ฅ์ด ์ข์๋ค.
๋ง์ง๋ง์๋ง Dropout : ์ด๋ Drop-rate 0.2 ~ 0.6 ๊น์ง ๋ค์ํ๊ฒ ์ฃผ์์ ๋ 0.25๊ฐ ๊ฐ์ฅ ๊ด์ฐฎ์ ์ฑ๋ฅ์ ๋ณด์๋ค.
Conv2D(512) ํ๋ ์ถ๊ฐ : ๊ธฐ์กด VGG์์ 4๊ฐ๊ฐ ์์ด์ง๋ง ๋ฐ์ดํฐ ํฌ๊ธฐ์ ์ด๋ฏธ์ง ํฌ๊ธฐ๋ฅผ ๊ณ ๋ คํ์ ๋ ํ๋๋ง ์ถ๊ฐํ๋ ๊ฒ์ด ๊ฐ์ฅ ๋์ ์ฑ๋ฅ์ ๋ณด์๋ค.
FC Layer 1024 : 256 ~ 2048(sopcnn) ~ 4096(vgg) ๋ชจ๋ ํด๋ณด์์ ๋ 1024๊ฐ ๊ฐ์ฅ ์ข์ ์ฑ๋ฅ์ ๋ณด์๋ค.
Epoch์ ์ถฉ๋ถํ ์ฃผ๊ณ Early Stopping์ผ๋ก ์ต์ ์ ๋ชจ๋ธ์ ์ฐพ๋ ๊ฒ์ด ์ข๋ค.
In [27]:
Copy model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3,3), padding='same', input_shape=(28, 28, 1)),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(64, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
# ๊ธฐ์กด์ Dropout ์ญ์
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(128, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
# ๊ธฐ์กด์ Dropout ์ญ์
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(256, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.Conv2D(256, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),##
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
# Conv2D 512 ์ถ๊ฐ
tf.keras.layers.Conv2D(512, (3,3), padding='same'),
tf.keras.layers.BatchNormalization(momentum=0.9, epsilon=1e-5, gamma_initializer="uniform"),##
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1024),
tf.keras.layers.LeakyReLU(alpha=0.1),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(10, activation='softmax')
])
In [25]:
Copy Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_22 (Conv2D) (None, 28, 28, 64) 640
_________________________________________________________________
batch_normalization_24 (Batc (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu_26 (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
conv2d_23 (Conv2D) (None, 28, 28, 64) 36928
_________________________________________________________________
batch_normalization_25 (Batc (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu_27 (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
conv2d_24 (Conv2D) (None, 28, 28, 64) 36928
_________________________________________________________________
batch_normalization_26 (Batc (None, 28, 28, 64) 256
_________________________________________________________________
leaky_re_lu_28 (LeakyReLU) (None, 28, 28, 64) 0
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 14, 14, 64) 0
_________________________________________________________________
conv2d_25 (Conv2D) (None, 14, 14, 128) 73856
_________________________________________________________________
batch_normalization_27 (Batc (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_29 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
conv2d_26 (Conv2D) (None, 14, 14, 128) 147584
_________________________________________________________________
batch_normalization_28 (Batc (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_30 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
conv2d_27 (Conv2D) (None, 14, 14, 128) 147584
_________________________________________________________________
batch_normalization_29 (Batc (None, 14, 14, 128) 512
_________________________________________________________________
leaky_re_lu_31 (LeakyReLU) (None, 14, 14, 128) 0
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 7, 7, 128) 0
_________________________________________________________________
conv2d_28 (Conv2D) (None, 7, 7, 256) 295168
_________________________________________________________________
batch_normalization_30 (Batc (None, 7, 7, 256) 1024
_________________________________________________________________
leaky_re_lu_32 (LeakyReLU) (None, 7, 7, 256) 0
_________________________________________________________________
conv2d_29 (Conv2D) (None, 7, 7, 256) 590080
_________________________________________________________________
batch_normalization_31 (Batc (None, 7, 7, 256) 1024
_________________________________________________________________
leaky_re_lu_33 (LeakyReLU) (None, 7, 7, 256) 0
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 3, 3, 256) 0
_________________________________________________________________
conv2d_30 (Conv2D) (None, 3, 3, 512) 1180160
_________________________________________________________________
batch_normalization_32 (Batc (None, 3, 3, 512) 2048
_________________________________________________________________
leaky_re_lu_34 (LeakyReLU) (None, 3, 3, 512) 0
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 1, 1, 512) 0
_________________________________________________________________
flatten_3 (Flatten) (None, 512) 0
_________________________________________________________________
dense_7 (Dense) (None, 1024) 525312
_________________________________________________________________
leaky_re_lu_35 (LeakyReLU) (None, 1024) 0
_________________________________________________________________
batch_normalization_33 (Batc (None, 1024) 4096
_________________________________________________________________
dropout_3 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_8 (Dense) (None, 10) 10250
=================================================================
Total params: 3,054,986
Trainable params: 3,049,738
Non-trainable params: 5,248
_________________________________________________________________
๊ฒฐ๊ณผ ๋ฐ ๋๋์
์์ง ๋ชจ๋ธ์ด ๊น์ง ์์ ๊ทธ๋ฐ ๊ฒ ๊ฐ์ง๋ง ์ข์ ๋ชจ๋ธ์ ์ฒ์๋ถํฐ loss ๋จ์ด์ง๋ ๊ฒ ๋ค๋ฅด๋ค. ์ข์ ๋ชจ๋ธ์ผ ์๋ก epoch 10 ์์ชฝ์์ ๋น ๋ฅด๊ฒ Val_Acc๊ฐ ์ข๊ฒ ๋์ ์ดํ๋ฅผ ๊ฐ๋ ํด๋ณผ ์ ์์๋ค. ๋ฌผ๋ก epoch์ 2000์ ๋๋ก ๋๊ณ ๊น๊ฒ ํ์ต์ํจ๋ค๋ฉด ์ข๊ฒ ์ง๋ง ํ์ ๋ ์์์์ ๊ทธ๋๋ง ๋์ ๋ชจ๋ธ์ ๊ณจ๋ผ๋ด๊ธฐ ์ํ ์ต์ ์ด ์๋๊น ์ถ๋ค. ์ด ๋ชจ๋ธ์ ๊ฒฝ์ฐ epoch 8์์ 0.99์ val_acc๋ฅผ ๋ณด์ด๊ณ ์ค๊ฐ ์ค๊ฐ 0.998๋ฅผ ์ํํ๊ธฐ๋ ํ์๋ค.
๊ทธ๋ฆฌ๊ณ ์๊ฒ๋ ๊ฒ์ด keras layer์ ๊ธฐ๋ณธ kernel weight initializer๊ฐ xavier๋ผ๋ ๊ฒ์ ์๊ฒ๋์๋ค. ์ข ๋ GPU ์์์ด ํ๋ฝํ๋ค๋ฉด weight initializer๋ ๋ฐ๊ฟ๋ณด๊ณ batch_size๋ ๋ ๋ค์ํ์ํฌ ์ ์์ง ์์๊น๋ ์์ฌ์์ด ๋จ๋๋ค.
๋ฒ์ธ) Auto Keras
Automl๋ก ์๋์ผ๋ก ๋ชจ๋ธ์ ์ง์ฃผ๋ ์๋์์ ๊ณผ์ฐ ๊ทธ ์ฑ๋ฅ์ ์ด๋จ๊น
In [ ]:
Copy import autokeras as ak
In [ ]:
Copy print(x_train.shape)
print(y_train.shape)
Copy (33600, 28, 28, 1)
(33600, 10)
In [ ]:
Copy # Initialize the ImageClassifier.
clf = ak.ImageClassifier(max_trials=3)
# Search for the best model.
clf.fit(x_train, y_train,
validation_data=(x_val, y_val), # validation set
epochs=10)
Copy Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 9s 9ms/step - loss: 0.1285 - accuracy: 0.9601 - val_loss: 0.0466 - val_accuracy: 0.9875
Epoch 2/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0476 - accuracy: 0.9849 - val_loss: 0.0360 - val_accuracy: 0.9902
Epoch 3/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0365 - accuracy: 0.9892 - val_loss: 0.0389 - val_accuracy: 0.9894
Epoch 4/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0292 - accuracy: 0.9907 - val_loss: 0.0374 - val_accuracy: 0.9907
Epoch 5/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0287 - accuracy: 0.9907 - val_loss: 0.0326 - val_accuracy: 0.9919
Epoch 6/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0249 - accuracy: 0.9922 - val_loss: 0.0316 - val_accuracy: 0.9895
Epoch 7/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0224 - accuracy: 0.9924 - val_loss: 0.0344 - val_accuracy: 0.9919
Epoch 8/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0176 - accuracy: 0.9941 - val_loss: 0.0341 - val_accuracy: 0.9923
Epoch 9/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0179 - accuracy: 0.9935 - val_loss: 0.0312 - val_accuracy: 0.9918
Epoch 10/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0164 - accuracy: 0.9940 - val_loss: 0.0356 - val_accuracy: 0.9919
Trial complete
Trial summary
|-Trial ID: 123c8d89e202f81d1fd46a1f9201f3fe|-Score: 0.03119376050014196|-Best step: 0
Hyperparameters:
|-classification_head_1/dropout_rate: 0.5|-classification_head_1/spatial_reduction_1/reduction_type: flatten|-dense_block_1/dropout_rate: 0|-dense_block_1/num_layers: 1|-dense_block_1/units_0: 128|-dense_block_1/use_batchnorm: False|-image_block_1/augment: False|-image_block_1/block_type: vanilla|-image_block_1/conv_block_1/dropout_rate: 0.25|-image_block_1/conv_block_1/filters_0_0: 32|-image_block_1/conv_block_1/filters_0_1: 64|-image_block_1/conv_block_1/kernel_size: 3|-image_block_1/conv_block_1/max_pooling: True|-image_block_1/conv_block_1/num_blocks: 1|-image_block_1/conv_block_1/num_layers: 2|-image_block_1/conv_block_1/separable: False|-image_block_1/normalize: True|-optimizer: adam
Copy Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 77s 73ms/step - loss: 0.2367 - accuracy: 0.9337 - val_loss: 0.0599 - val_accuracy: 0.9837
Epoch 2/10
1050/1050 [==============================] - 69s 66ms/step - loss: 0.0999 - accuracy: 0.9753 - val_loss: 0.0854 - val_accuracy: 0.9829
Epoch 3/10
1050/1050 [==============================] - 69s 66ms/step - loss: 0.0586 - accuracy: 0.9848 - val_loss: 1.5777 - val_accuracy: 0.8914
Epoch 4/10
1050/1050 [==============================] - 71s 68ms/step - loss: 0.0444 - accuracy: 0.9887 - val_loss: 0.0594 - val_accuracy: 0.9870
Epoch 5/10
1050/1050 [==============================] - 71s 67ms/step - loss: 0.0477 - accuracy: 0.9876 - val_loss: 0.0479 - val_accuracy: 0.9887
Epoch 6/10
1050/1050 [==============================] - 69s 66ms/step - loss: 0.0505 - accuracy: 0.9888 - val_loss: 10.7493 - val_accuracy: 0.6519
Epoch 7/10
1050/1050 [==============================] - 68s 65ms/step - loss: 0.0536 - accuracy: 0.9868 - val_loss: 0.0613 - val_accuracy: 0.9869
Epoch 8/10
1050/1050 [==============================] - 68s 65ms/step - loss: 0.0280 - accuracy: 0.9922 - val_loss: 0.0520 - val_accuracy: 0.9874
Epoch 9/10
1050/1050 [==============================] - 73s 70ms/step - loss: 0.0305 - accuracy: 0.9926 - val_loss: 0.0414 - val_accuracy: 0.9904
Epoch 10/10
1050/1050 [==============================] - 72s 68ms/step - loss: 0.0369 - accuracy: 0.9911 - val_loss: 0.0493 - val_accuracy: 0.9902
Trial complete
Trial summary
|-Trial ID: 03db11dd05b1734a2cf3413c1ac7e197|-Score: 0.04135792684210588|-Best step: 0
Hyperparameters:
|-classification_head_1/dropout_rate: 0|-dense_block_1/dropout_rate: 0|-dense_block_1/num_layers: 2|-dense_block_1/units_0: 32|-dense_block_1/units_1: 32|-dense_block_1/use_batchnorm: False|-image_block_1/augment: True|-image_block_1/block_type: resnet|-image_block_1/normalize: True|-image_block_1/res_net_block_1/conv3_depth: 4|-image_block_1/res_net_block_1/conv4_depth: 6|-image_block_1/res_net_block_1/pooling: avg|-image_block_1/res_net_block_1/version: v2|-optimizer: adam
Copy Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 7s 7ms/step - loss: 0.1271 - accuracy: 0.9611 - val_loss: 0.0507 - val_accuracy: 0.9870
Epoch 2/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0479 - accuracy: 0.9854 - val_loss: 0.0386 - val_accuracy: 0.9898
Epoch 3/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0371 - accuracy: 0.9886 - val_loss: 0.0337 - val_accuracy: 0.9911
Epoch 4/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0294 - accuracy: 0.9906 - val_loss: 0.0340 - val_accuracy: 0.9906
Epoch 5/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0257 - accuracy: 0.9915 - val_loss: 0.0327 - val_accuracy: 0.9914
Epoch 6/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0239 - accuracy: 0.9920 - val_loss: 0.0331 - val_accuracy: 0.9910
Epoch 7/10
1050/1050 [==============================] - 7s 7ms/step - loss: 0.0215 - accuracy: 0.9928 - val_loss: 0.0311 - val_accuracy: 0.9920
Epoch 8/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0192 - accuracy: 0.9934 - val_loss: 0.0313 - val_accuracy: 0.9918
Epoch 9/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0169 - accuracy: 0.9942 - val_loss: 0.0300 - val_accuracy: 0.9924
Epoch 10/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0166 - accuracy: 0.9945 - val_loss: 0.0324 - val_accuracy: 0.9918
Trial complete
Trial summary
|-Trial ID: 2ce4926fd3ec015466417c00c29b3ca4|-Score: 0.029994319529753708|-Best step: 0
Hyperparameters:
|-classification_head_1/dropout_rate: 0.5|-classification_head_1/spatial_reduction_1/reduction_type: flatten|-dense_block_1/dropout_rate: 0|-dense_block_1/num_layers: 1|-dense_block_1/units_0: 128|-dense_block_1/use_batchnorm: False|-image_block_1/augment: False|-image_block_1/block_type: vanilla|-image_block_1/conv_block_1/dropout_rate: 0.25|-image_block_1/conv_block_1/filters_0_0: 32|-image_block_1/conv_block_1/filters_0_1: 64|-image_block_1/conv_block_1/kernel_size: 3|-image_block_1/conv_block_1/max_pooling: True|-image_block_1/conv_block_1/num_blocks: 1|-image_block_1/conv_block_1/num_layers: 2|-image_block_1/conv_block_1/separable: False|-image_block_1/normalize: True|-optimizer: adam
Copy INFO:tensorflow:Oracle triggered exit
Train for 1050 steps, validate for 263 steps
Epoch 1/10
1050/1050 [==============================] - 7s 7ms/step - loss: 0.1359 - accuracy: 0.9585 - val_loss: 0.0513 - val_accuracy: 0.9870
Epoch 2/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0482 - accuracy: 0.9852 - val_loss: 0.0372 - val_accuracy: 0.9899
Epoch 3/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0360 - accuracy: 0.9888 - val_loss: 0.0364 - val_accuracy: 0.9906
Epoch 4/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0303 - accuracy: 0.9905 - val_loss: 0.0340 - val_accuracy: 0.9906
Epoch 5/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0261 - accuracy: 0.9918 - val_loss: 0.0327 - val_accuracy: 0.9917
Epoch 6/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0234 - accuracy: 0.9923 - val_loss: 0.0310 - val_accuracy: 0.9924
Epoch 7/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0212 - accuracy: 0.9934 - val_loss: 0.0334 - val_accuracy: 0.9920
Epoch 8/10
1050/1050 [==============================] - 7s 6ms/step - loss: 0.0192 - accuracy: 0.9937 - val_loss: 0.0354 - val_accuracy: 0.9919
Epoch 9/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0160 - accuracy: 0.9947 - val_loss: 0.0331 - val_accuracy: 0.9918
Epoch 10/10
1050/1050 [==============================] - 6s 6ms/step - loss: 0.0160 - accuracy: 0.9945 - val_loss: 0.0416 - val_accuracy: 0.9912
์์ฑ๋ ๋ชจ๋ธ Summary ๋ฐ Sumbit
In [ ]:
Copy Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
normalization (Normalization (None, 28, 28, 1) 3
_________________________________________________________________
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 24, 64) 18496
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 12, 12, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 9216) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 9216) 0
_________________________________________________________________
dense (Dense) (None, 10) 92170
_________________________________________________________________
classification_head_1 (Softm (None, 10) 0
=================================================================
Total params: 110,989
Trainable params: 110,986
Non-trainable params: 3
_________________________________________________________________
์ ๊ธฐํ๊ฒ๋ ๊ฝค ๋น์ทํ ๋ชจ๋ธ์ ์งฐ๋ค. ์์ฌ ๋จ์ํ๊ณ params๊ฐ 11๋ง ๋ฐ์ ์๋์ง๋ง ์ฑ๋ฅ์ Public Dashboard ๊ธฐ์ค 0.9911์ด ๋์๋ค.In [ ]:
Copy # Evaluate on the testing data.
print('Accuracy: {accuracy}'.format(
accuracy=clf.evaluate(x_val, y_val)))
Copy 263/263 [==============================] - 1s 4ms/step - loss: 0.0416 - accuracy: 0.9912
Accuracy: [0.04157625054905844, 0.9911905]
In [ ]:
Copy model = clf.export_model() # Auto Keras๋ก ์์ฑ๋ ๋ชจ๋ธ Export
save_and_submit(model, "autokeras.csv", "AutoKeras_iter3")
Validation ํ์ต
In [ ]:
Copy model.fit(x_val, y_val, epochs=5, batch_size=1024, verbose=3)
save_and_submit(model, "submit.csv", "AK+val")
Validation Set๊น์ง ํ์ต์ํค๋ 0.993 ์ผ๋ก ์ฑ๋ฅ์ด ํฅ์๋์๋ค.