Multiclass Hinge Loss

Turtle0105 2023. 11. 17. 19:49

Intro

Computer vision을 위한 딥러닝 - linear classifier chapter를 듣다가, multiclass hinge loss에 대해 알게되었다. Multiclass support vector machine에 사용되는 loss function이라고 하는데, 처음본다. 오늘은 이 내용에 대해 구현해보자. 하나의 이미지당 하나의 객체를 담고있는 CIFAR data를 사용해 보았다. 아무래도 linear classifier는 하나에 여러개의 복잡한 객체를 담고있는 이미지는 학습이 어렵기 때문에, 그리고 강의에서도 CIFAR-10을 사용했어서 해당 데이터셋을 선택했다.

Code

Library

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob

CIFAR-10 Data Set Loading

def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='latin1')
    return dict

files = glob.glob('./data/cifar-10-batches-py/data_batch_*')
train_xs = []
train_ys = []

for path in files:
    train_batch = unpickle(path)
    train_xs.append(train_batch['data'])
    train_ys.append(train_batch['labels'])

X_train = np.concatenate(train_xs)
Y_train = np.concatenate(train_ys)
    
file = './data/cifar-10-batches-py/test_batch'
test_batch = unpickle(file)
X_test = test_batch['data']
Y_test = test_batch['labels']

num_train = 40000 # num_val = 10000

X_val = X_train[num_train:]
Y_val = Y_train[num_train:]

X_train = X_train[:num_train]
Y_train = Y_train[:num_train]

print(f'# of train {X_train.shape[0]}, # of validation {X_val.shape[0]}, # of test {X_test.shape[0]}')

# of train 40000, # of validation 10000, # of test 10000

Image reshape: (3,32,32) -> (32,32,3)

Default CIFAR-10 image shape is (3,32,32) :(

def CIFAR_img_reshape(img):
    return img.reshape(3,32,32).transpose(1,2,0)

def CIFAR_img_show(img, reshaped = False, plt_show = True):
    if reshaped:
        plt.imshow(img.astype('uint8'))
    else:
        plt.imshow(CIFAR_img_reshape(img).astype('uint8'))
    if plt_show:
        plt.show()

Example images visualized

2개만 나오게 했는데, 숫자 조정해서 더 봐보세요!

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for vis_idx in range(2):
    print(classes[Y_train[vis_idx]])
    CIFAR_img_show(X_train[vis_idx])

frog

truck

Normalization

이건 channel별로 표준화도 해보고, min max도 해보고 다 해봤는데, 결과론적으로 그냥 mean pixel 빼는게 제일 결과가 그럴듯하다;;

img_mean = np.mean(X_train, axis=0)
img_std = np.std(X_train, axis=0)

X_train = (X_train - img_mean)#/img_std
X_test = (X_test - img_mean)#/img_std
X_val = (X_val - img_mean)#/img_std

Score

Initialize

# Bias trick
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])

num_tot_pix = 3072
num_class = 10

np.random.seed(1)

W = np.random.randn(num_class, num_tot_pix + 1) # weight matrix, +1 for the bias vector
dW = np.zeros(W.shape) # gradient matrix

e.g. Get scores from the first image

score = W.dot(X_train[0,:])
score

array([ -4953.17018765,   2342.99701817,  -1089.8713787 ,  -2563.4406432 ,
       -10008.10726685,  -1805.66511407,   3983.61933401,     93.40001031,
          966.25345641,   7364.63880978])

Multi-class SVM

Loss function

$$ L_i=\sum_{j≠y_i}max(0,s_j−s_{y_i}+\Delta) $$

Where $i$ is the sample index, $y_i$ is the correct label, and $j$ is the category index.

from tqdm import tqdm

def L_i_vec(score_vec, correct_idx, delta = 1.0):
    loss_j = np.maximum(0, score_vec - score_vec[correct_idx] + delta)
    loss_j[correct_idx] = 0
    return loss_j

print("loss i vector", L_i_vec(score, Y_train[0], 1))
print("loss: ", sum(L_i_vec(score, Y_train[0], 1)))

loss i vector [   0.            0.            0.            0.            0.
    0.            0.            0.            0.         3382.01947577]
loss:  3382.0194757748986

Gradients

Incorrect categories

$$ \frac{\partial l}{\partial w_j} = I(s_j-s_{y_i}+\Delta>0)\times x_i$$

Correct category
$$ \frac{\partial l}{\partial w_{y_i}} = \sum_{j\neq y_i}I(s_j-s_{y_i}+\Delta>0)\times(-x_i)$$

Where $I()$ is an indicator function
Chain rule: $ \frac{\partial l}{\partial w} = \frac{\partial l}{\partial s} \times \frac{\partial s}{\partial w}$

def update_dW(dW, X_i, loss_i_vec, correct_idx):
    cond = loss_i_vec > 0
    dW[cond,] += X_i
    dW[correct_idx, ] -= sum(cond)*X_i

update_dW(dW, X_train[1], L_i_vec(score, 0, 1), 0)
dW

array([[-186.9994  ,   32.769   ,  208.151   , ..., -225.4372  ,
        -237.4054  ,   -8.      ],
       [  23.374925,   -4.096125,  -26.018875, ...,   28.17965 ,
          29.675675,    1.      ],
       [  23.374925,   -4.096125,  -26.018875, ...,   28.17965 ,
          29.675675,    1.      ],
       ...,
       [  23.374925,   -4.096125,  -26.018875, ...,   28.17965 ,
          29.675675,    1.      ],
       [  23.374925,   -4.096125,  -26.018875, ...,   28.17965 ,
          29.675675,    1.      ],
       [  23.374925,   -4.096125,  -26.018875, ...,   28.17965 ,
          29.675675,    1.      ]])

Regularization term

L2
$$R(W) = \lambda\times\sum_j\sum_lw_{jl}^2$$

lambd = 0.001
R = lambd * np.sum(W**2)
dR = 2 * lambd * W

One iteration

Ls = []
dW = np.zeros(W.shape) # gradient matrix
for image, label in tqdm(zip(X_train, Y_train)):
    score = W.dot(image)
    L_i_temp = L_i_vec(score, label)
    
    # append loss for i
    Ls.append(sum(L_i_temp))
    update_dW(dW, image, L_i_temp, label)

L = np.average(Ls) + R
dW_avg = dW/len(X_train) + dR

40000it [00:05, 6670.47it/s]

def multiclass_hinge_loss_L2(X_train, Y_train, W, lambd, delta = 1):
    Ls = []
    dW = np.zeros(W.shape) # gradient matrix
    
    for image, label in zip(X_train, Y_train):
        score = W.dot(image)
        L_i_temp = L_i_vec(score, label, delta)

        # append loss for i
        Ls.append(sum(L_i_temp))
        update_dW(dW, image, L_i_temp, label)

    R = lambd * np.sum(W**2)
    dR = 2 * lambd * W

    L = np.average(Ls) + R
    dW_avg = dW/len(X_train) + dR
    
    return L, dW_avg

Update

$$W(t+1)=W(t)−\eta x_i$$

Validation을 원래 최적의 leatning rate $\eta$를 구하거나, regularization에서 $\lambda$를 구할 때 사용하는데, 귀찮아서 누군가 인터넷에서

eta = 1e-7, lambd = 2.5e4

이거로 CIFAR-10, multiclass SVM, L2 regularization 한거 있어서 그냥 Validation 사용 안하고 바로 결과냄.

np.random.seed(1)

W = np.random.randn(num_class, num_tot_pix + 1) * 1e-3 # weight matrix, +1 for the bias vector

eta = 1e-7
lambd = 2.5e4
tr_loss_vec = []
val_loss_vec = []
for iter in tqdm(range(1000)):
    tr_loss, grad = multiclass_hinge_loss_L2(X_train, Y_train, W, lambd)
    tr_loss_vec.append(tr_loss)
    
    W -= eta * grad

100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [1:54:35<00:00,  6.88s/it]

plt.plot(tr_loss_vec)
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.show()

Prediction

성능은 좋지 못함...

def prediction(img, W):
    score = W.dot(img)
    predicted = np.argmax(score)
    
    return predicted

predicted_vec = []
for test_img in X_test:
    predicted_vec.append(prediction(test_img, W))

from sklearn.metrics import accuracy_score

accuracy_score(Y_test, predicted_vec)

0.369

Templates

강의에서 처럼, horse는 대가리가 양쪽으로 있는듯한 양상이다. Linear classifier는 하나의 class당 하나의 mode밖에 잡지 못한다는걸 직접 확인해보는 순간이다. 그리고 파란게 많은 plane & ship template, 빨간차를 많이 학습한듯한 car (실제로 CIFAR-10에는 빨간차가 많으려나...?) 등등 여러 해석의 여지가 있어보인다.

templates = W[...,:(W.shape[1]-1)]

for c in range(10):
    plt.subplot(1, 10, (c+1))
    plt.tick_params(left = False, labelleft = False , labelbottom = False, bottom = False)
    temp = 255.0 * (templates[c] - templates[c].min()) / (templates[c].max() - templates[c].min())
    CIFAR_img_show(temp, plt_show = False)
    plt.title(classes[c])

plt.show()

Outro

오랜만에 scratch부터 implementation해서 재미있었다. 중간에 원래라면 validation set을 이용해서 learning rate나 $\lambda$구해야 하는데, 현실적으로 시간이 너무 오래걸릴거 같아 미리 진행해본 사람들의 수치를 참고했다... 그것 외에는 전부 처음부터 구현 해보았다.

그래서 그런지, 시간이 오래걸리는 좋지 못한 알고리즘이 나온것 같다. 같이 공부하는 친구들 이야기로 vectorization (공부해보아야 하는 부분) 사용하면 훨씬 빠른 계산이 가능하다고 한다. 나는 위의 계산을 하나의 샘플에 대해서만 구현하고, for loop를 돌려 오래걸린듯하다. 하나의 샘플에 대하여 vector로 for loop을 돌려 계산하던걸, 모든 샘플에 대하여 loop를 사용하지 않고 matrix 수준에서의 operation 즉, numpy operation을 통해 계산하도록 만들면 더 빠를듯 하다.

그리고 binary class hinge loss에 대해서만 $y\in\{-1, 1\}$로 놓고, second derivative 가 non-negative라는 사실을 확인하고, multiclass version은 별도의 계산 없이 그냥 convex겠거니 가정하고 진행했다. 이 부분도 시간나면 해보고싶다.

아래는 코드.

LinearClassifier_template.ipynb

0.18MB

강의출처: https://www.youtube.com/watch?v=qcSEP17uKKY&list=PL5-TkQAfAZFbzxjBHtzdVCWE0Zbhomg7r&index=4

저작자표시 (새창열림)