๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ› Research/OCR

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Data Augmentation for Scene Text Recognition

by ๋ญ…์ฆค 2023. 3. 11.
๋ฐ˜์‘ํ˜•

ํ…์ŠคํŠธ ์ธ์‹์— ํฌ์ปค์Šค๊ฐ€ ๋งž์ถฐ์ง„ augmentation์ด ์žˆ์„๊นŒ ์‹ถ์–ด ๋…ผ๋ฌธ์„ ์ฐพ๋˜์ค‘ ICCV 2021 ํ•™ํšŒ์—์„œ ๋ฐœํ‘œ๋œ STR์—์„œ์˜ Data augmentation ๋…ผ๋ฌธ์ด ์žˆ์–ด์„œ ์ •๋ฆฌํ•˜๋ ค ํ•œ๋‹ค.

 

Abstract ์ผ๋ถ€

Scene Text Recognition(STR) ๋ชจ๋ธ์€ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ํ‰๊ฐ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ๊ฐ„์˜ ๋ถˆ์ผ์น˜๋Š” ์ฃผ๋กœ nosie, artifacts, geometry, structure ๋“ฑ์˜ ์˜ํ–ฅ์„ ๋ฐ›์•„์„œ ์„ฑ๋Šฅ ์ €ํ•˜๋กœ ์ด์–ด์ง„๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด 36๊ฐœ์˜ image augmenation function์œผ๋กœ ๊ตฌ์„ฑ๋œ STRAug๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค. ๊ฐ ํ•จ์ˆ˜๋Š” ์ž์—ฐ ์žฅ๋ฉด์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ๊ฑฐ๋‚˜ ์นด๋ฉ”๋ผ ์„ผ์„œ์— ์˜ํ•ด ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ์‹ ํ˜ธ ์ฒ˜๋ฆฌ ์ž‘์—… ์ค‘ ๋ฐœ์ƒํ•˜๋Š” ์ด๋ฏธ์ง€ ์†์„ฑ์„ ๋ชจ๋ฐฉํ•œ๋‹ค.

 

Data Augmentation for STR

์œ„ figure 1์€ ์—ฌ๋Ÿฌ STR ๋ชจ๋ธ์˜ baseline ์˜ˆ์ธก ๊ฒฐ๊ณผ์™€ STRAug๋ฅผ ์ถ”๊ฐ€ํ•œ ๊ฒฝ์šฐ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•ด ๋ณด์—ฌ์ค€๋‹ค. ์ฒด๋ฆฌํ”ผํ‚น์ด๊ธด ํ•˜๊ฒ ์ง€๋งŒ ๊ทธ๋ž˜๋„ ์„ฑ๋Šฅ์ด ์ข‹์•„์กŒ๋‹ค๋‹ˆ ๊ธฐ๋Œ€๋œ๋‹ค. ๋‹จ์ˆœ augmentation ๋งŒ์œผ๋กœ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฑด ๊ต‰์žฅํžˆ ๋‚˜์ด์Šคํ•˜๋‹ˆ๊นŒ.

 

Figure 2๋Š” ํ…์ŠคํŠธ ์ด๋ฏธ์ง€์—์„œ ์ฒผ๋ฆฐ์ง•ํ•œ ํ…์ŠคํŠธ ํ˜•ํƒœ๋“ค์„ ๋ณด์—ฌ์ค€๋‹ค. ํœ˜์–ด์žˆ๊ฑฐ๋‚˜ ๊ทธ๋ฆผ์ž๊ฐ€ ์žˆ๊ฑฐ๋‚˜ ํšŒ์ „๋˜์—ˆ๊ฑฐ๋‚˜ ํฐํŠธ๊ฐ€ ํŠน์ดํ•˜๊ฑฐ๋‚˜... ๊ต‰์žฅํžˆ ๋งŽ์€ ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค.

 

์‚ฌ์‹ค ์ €๋ ‡๊ฒŒ ๋‹ค์–‘ํ•œ ์ผ€์ด์Šค์˜ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์„ ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ข‹๊ฒ ์ง€๋งŒ ํ˜„์‹ค์ ์œผ๋กœ ์‰ฝ์ง€ ์•Š์œผ๋‹ˆ ์ ์ ˆํ•œ augmentation์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ์ด๋“์„ ์ทจํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค.

 

STRAug

๋ณธ ๋…ผ๋ฌธ์—์„œ ์‹ค์ œ๋กœ ์ ์šฉํ•œ augmentation์˜ ํฐ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” warp, geometry, pattern, noise, blur, weather, camera, process ์ด 8 ๊ฐ€์ง€ ์ด๋‹ค. ํ•˜๋‚˜์”ฉ ์‚ฌ์ง„์„ ์‚ดํŽด๋ณด์ž. 

 

๊น”๋”ํ•œ ํ™˜๊ฒฝ์—์„œ ํ…์ŠคํŠธ ์ธ์‹ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์ž˜ ์•Œ ์ˆ˜ ์—†์ง€๋งŒ, ์‹ค์ œ๋กœ ํ™˜๊ฒฝ์—์„œ ์œ ์ž…๋˜๋Š” ์ด๋ฏธ์ง€์—์„œ ํ…์ŠคํŠธ ๊ฒ€์ถœ ํ›„ ํ…์ŠคํŠธ ์ธ์‹์„ ํ•˜๊ฒŒ ๋˜๋ฉด ๊ต‰์žฅํžˆ ๋‹ค์–‘ํ•œ ์ผ€์ด์Šค์˜ ํ…์ŠคํŠธ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋‚˜๊ฒŒ ๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ต‰์žฅํžˆ ๋‹ค์–‘ํ•œ ์ผ€์ด์Šค๋ฅผ ๋‚˜๋ˆ„์–ด ํšจ๊ณผ์ ์ธaugmentation๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ perspective, ๊ทธ๋ฆผ์ž, ๋ธ”๋Ÿฌ ๋“ฑ์€ ๊ต‰์žฅํžˆ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์ด์Šˆ์ด๋‹ค. 

 

๋‹ค๋งŒ ์•„์‰ฌ์šด ์ ์€ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ํ…์ŠคํŠธ ์ธ์‹์€ ํ…์ŠคํŠธ ๊ฒ€์ถœ ์งํ›„ ์‹œํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์— ํ…์ŠคํŠธ box ์ž์ฒด๊ฐ€ ์ •ํ™•ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ์ด๋ฅผ ๊ฐ€์ •ํ•ด์„œ ํ…์ŠคํŠธ ์ฃผ๋ณ€์œผ๋กœ ๋งˆ์ง„์„ ์ค„์ด๊ฑฐ๋‚˜ ๋” ์ถ”๊ฐ€ํ•˜๋Š” ๋“ฑ์˜ augmentation๋„ ์žˆ์œผ๋ฉด ์–ด๋–จ๊นŒ ์‹ถ๋‹ค. ๋œป์žˆ๋Š” ๋Œ€ํ•™์›์ƒ์ด ์žˆ๋‹ค๋ฉด ์‹คํ—˜ํ•ด๋ณด๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค.

 

 

์‹คํ—˜ ๊ฒฐ๊ณผ

 

Table 4๋Š” RARE ๋ผ๋Š” ๋ชจ๋ธ์— ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹ ํ™˜๊ฒฝ์—์„œ ์•ž์„œ ์‚ดํŽด๋ณธ 8 ๊ฐ€์ง€ augmetnation์„ ๊ฐ๊ฐ ์ ์šฉ์‹œ์ผœ๋ณธ ๊ฒฐ๊ณผ์ด๋‹ค. ์—ญ์‹œ๋‚˜ ๋Š˜ augmenation์ด ํ•ญ์ƒ ์˜ณ์€ ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค. ๊ทธ๋ž˜๋„ ์ „๋ฐ˜์ ์œผ๋กœ ๋งŽ์€ ์ผ€์ด์Šค์˜ augmentation์ด ํšจ๊ณผ์ ์ธ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. Data augmentation์€ 50% ์˜ ํ™•๋ฅ ๋กœ ์ ์šฉ๋œ ๊ฒƒ์ด๋ผ ํ•œ๋‹ค.

 

Table 5๋Š” 8 ๊ฐ€์ง€์˜ augmentation์ด ์กฐํ•ฉ๋œ STRAug์™€ ๋‹ค๋ฅธ augmentation ๊ณผ์˜ ์„ฑ๋Šฅ ์ƒ์Šน ๋น„๊ต๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. STRAug๊ฐ€ ๋‹ค๋ฅธ augmentation์— ๋น„ํ•ด ์ƒ๋‹นํžˆ ํšจ๊ณผ์ ์œผ๋กœ ๋ณด์ธ๋‹ค.

 


ํ•˜์ง€๋งŒ, ํผ๋ธ”๋ฆญ ๋ฐ์ดํ„ฐ์…‹์—์„œ์˜ ์‹คํ—˜์ด๊ณ  ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” ๊ฒ€์ฆ๋˜์ง€ ์•Š์€ augmentation์ด๋‹ค. ๋˜ํ•œ ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋Š” accuracy 1~2% ์ •๋„์˜ ์ƒ์Šน์€ ๊ทธ๋‹ค์ง€ ๋ˆˆ์—ฌ๊ฒจ ๋ณผ๋งŒํ•œ ์ฐจ์ด๋ฅผ ๊ฐ€์ ธ๋‹ค ์ฃผ์ง€ ์•Š์„์ง€๋„ ๋ชจ๋ฅธ๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ ํ…์ŠคํŠธ ์ธ์‹ ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•ด ์ถฉ๋ถ„ํžˆ ๊ณ ๋ คํ•ด๋ณผ๋งŒํ•œ data augmentation ๋ฐฉ๋ฒ•์ธ ๊ฒƒ ๊ฐ™๋‹ค. ํ…์ŠคํŠธ ์ธ์‹ ๋ชจ๋ธ์˜ ๋ชฉ์ ์— ๋”ฐ๋ผ augmentation ์ข…๋ฅ˜๋ฅผ ์„ ๋ณ„ํ•ด์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋„ ๋‚˜์˜์ง€ ์•Š์•„ ๋ณด์ธ๋‹ค. 

 

๊ณต์‹ ๊นƒํ—ˆ๋ธŒ ๋ ˆํผ์ง€ํ† ๋ฆฌ๋ฅผ ์ฐธ๊ณ ํ•˜๋ฉด ๊ต‰์žฅํžˆ ์‚ฌ์šฉํ•˜๊ธฐ ํŽธํ•˜๊ฒŒ ๋˜์–ด์žˆ์ง€๋งŒ, ํ…Œ์ŠคํŠธ ์ฝ”๋“œ๊ฐ€ StrAug ์˜ ์—ฌ๋Ÿฌ augmentation ๊ธฐ๋ฒ•์„ ํ•˜๋‚˜์”ฉ ์ ์šฉํ•ด์„œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ด๋ผ ์‹ค์ œ ๋ชจ๋ธ ํ•™์Šต์— ๋ฐ”๋กœ ์ ์šฉ์‹œํ‚ค๊ธฐ์—๋Š” ๋ถ€์ ์ ˆํ•˜๋‹ค.

๋•Œ๋ฌธ์— ์›ํ•˜๋Š” augmentation ์œ ํ˜•์„ ๊ณจ๋ผ์„œ ๋žœ๋คํ•˜๊ฒŒ augmentation์„ ์ ์šฉํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ํ…Œ์ŠคํŠธ ์ฝ”๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ pull request๋ฅผ ํ•ด๋†“์€ ์ƒํƒœ์ด๋‹ค. ์ ์šฉ๋ ์ง€ ์•ˆ๋ ์ง€ ๋ชฐ๋ผ์„œ ์ž‘์„ฑํ•œ ์ฝ”๋“œ๋ฅผ ์•„๋ž˜์— ์ฒจ๋ถ€ํ•œ๋‹ค.

 

import argparse
import os
import numpy as np
from PIL import Image
import cv2

from straug.warp import Curve, Distort, Stretch
from straug.geometry import Rotate, Perspective, Shrink, TranslateX, TranslateY
from straug.blur import GaussianBlur, DefocusBlur, MotionBlur, GlassBlur, ZoomBlur
from straug.camera import Contrast, Brightness, JpegCompression, Pixelate
from straug.noise import GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise
from straug.pattern import VGrid, HGrid, Grid, RectGrid, EllipseGrid
from straug.process import Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color
from straug.weather import Fog, Snow, Frost, Rain, Shadow

class Random_StrAug(object):
    def __init__(self, using_aug_types, prob_list = None):
        self.aug_list = []
        if 'warp' in using_aug_types :
            self.aug_list.append([Curve(), Distort(), Stretch()]) 
        if 'geometry' in using_aug_types :
            self.aug_list.append([Rotate(), Perspective(), Shrink(), TranslateX(), TranslateY()]) 
        if 'blur' in using_aug_types :
            self.aug_list.append([GaussianBlur(), DefocusBlur(), MotionBlur(), GlassBlur(), ZoomBlur()]) 
        if 'noise' in using_aug_types :
            self.aug_list.append([GaussianNoise(), ShotNoise(), ImpulseNoise(), SpeckleNoise()]) 
        if 'camera' in using_aug_types :
            self.aug_list.append([Contrast(), Brightness(), JpegCompression(), Pixelate()]) 
        if 'pattern' in using_aug_types :
            self.aug_list.append([VGrid(), HGrid(), Grid(), RectGrid(), EllipseGrid()]) 
        if 'process' in using_aug_types :
            self.aug_list.append([Posterize(), Solarize(), Invert(), Equalize(), AutoContrast(), Sharpness(), Color()]) 
        if 'weather' in using_aug_types :
            self.aug_list.append([Fog(), Snow(), Frost(), Rain(), Shadow()]) 
    
        self.mag_range = np.random.randint(-1, 3)
        if prob_list is None :
            self.prob_list = [0.5] * len(self.aug_list)
        else:
            assert len(self.aug_list) == len(prob_list), "The length of 'prob_list' must be the same as the number of augmentations used."
            self.prob_list = prob_list

    def __call__(self, img):
        for i in range(len(self.aug_list)):
            img = self.aug_list[i][np.random.randint(0, len(self.aug_list[i]))](img, mag = self.mag_range, prob = self.prob_list[i])

        return img
    
if __name__ == '__main__':
    random_StrAug_1 = Random_StrAug(using_aug_types = ['warp', 'geometry', 'blur', 'noise', 'camera', 'pattern', 'process', 'weather'],
                                  prob_list = [0.5, 0.3, 0.3, 0.2, 0.2, 0.1, 0.1, 0.1])
    
    random_StrAug_2 = Random_StrAug(using_aug_types = ['warp', 'pattern', 'process', 'weather'],
                                  prob_list = [0.5, 0.3, 0.2, 0.5])
    
    
    parser = argparse.ArgumentParser()
    parser.add_argument('--image', default="images/delivery.png", help='Load image file')
    parser.add_argument('--results', default="results", help='Folder for augmented image files')
    parser.add_argument('--gray', action='store_true', help='Convert to grayscale 1st')
    parser.add_argument('--width', default=200, type=int, help='Default image width')
    parser.add_argument('--height', default=64, type=int, help='Default image height')
    parser.add_argument('--seed', default=0, type=int, help='Random number generator seed')
    opt = parser.parse_args()
    os.makedirs(opt.results, exist_ok=True)

    img = Image.open(opt.image)
    img = img.resize((opt.width, opt.height))

    augmented_img_1 = random_StrAug_1(img)
    augmented_img_2 = random_StrAug_2(img)

    # Save images to compare before and after augmentation.
    result = cv2.cvtColor(np.hstack((np.array(img), np.array(augmented_img_1), np.array(augmented_img_2))), cv2.COLOR_RGB2BGR)
    cv2.imwrite(os.path.join(opt.results, opt.image.split('/')[-1].split('.')[0] + '_random_strAug.jpg'), result)

 

๋ฐ˜์‘ํ˜•