๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ› Research/Detection & Segmentation

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Deep Learning for Large-Scale Traffic-Sign Detection and Recognition / ๊ตํ†ต ํ‘œ์ง€ํŒ ๊ฒ€์ถœ

by ๋ญ…์ฆค 2022. 7. 8.
๋ฐ˜์‘ํ˜•

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” Traffic sign detection (๊ตํ†ต ํ‘œ์ง€ํŒ ๊ฐ์ง€) ์— ๋Œ€ํ•œ ๋…ผ๋ฌธ 2๊ฐœ๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

 

  • Traffic-Sign Detection and Classification in the Wild / CVPR 2016 
  • Deep Learning for Large-Scale Traffic-Sign Detection and Recognition / IEEE T-ITS 2019

 

Traffic sign detection ์€ object detection์˜ ํ•˜์œ„ task๋กœ ๋ณผ ์ˆ˜ ์žˆ๊ณ , ์ž์œจ ์ฃผํ–‰ ๋ฐ ๋„๋กœ ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•˜๋Š”๋ฐ ํ•„์ˆ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๊ต‰์žฅํžˆ ์ž‘์€ ๊ฐ์ฒด๋ฅผ ๊ฐ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์ด ๊ถ๊ธˆํ–ˆ์—ˆ๋Š”๋ฐ, traffic sign detection ๋…ผ๋ฌธ๋“ค์ด ๋„์›€์ด ๋˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

 


"Traffic-Sign Detection and Classification in the Wild"

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” traffic sign detection benchmark ๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด dataset์„ ๊ตฌ์ถ•ํ•˜๊ณ , ๋„๋กœ ํ‘œ์ง€ํŒ์„ ๊ฐ์ง€ ๋ฐ ๋ถ„๋ฅ˜ํ•˜๋Š” end-to-end ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. 

 

 

์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์–ธ๊ธ‰ํ•˜๋Š” traffic sign detection์˜ ์–ด๋ ค์šด ์ ์€ ํฌ๊ฒŒ 2๊ฐ€์ง€๋กœ, 1) ๋„๋กœ ํ‘œ์ง€ํŒ์ด ๋งค์šฐ ์ž‘์€ ๊ฒฝ์šฐ ๊ธฐ์กด detection ๋ชจ๋ธ๋กœ ์ธ์‹์ด ์ž˜ ์•ˆ๋˜๊ณ , 2) class์™€ ํ‘œ์ง€ํŒ ์‚ฌ์ด์ฆˆ์˜ imbalance ๊ฐ€ ์‹ฌํ•˜๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

 

๊ทธ ์™ธ์—๋„ ๋ˆˆ, ๋น„, ํ–‡๋น›์˜ ๋ฐ˜์‚ฌ ๋“ฑ ๊ธฐ์ƒ์˜ ์˜ํ–ฅ๊ณผ OCR task ์—์„œ๋„ ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” perspective distortion ๊ฐ™์€ ๋ฌธ์ œ๋„ ์ˆ˜๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค.

 

์˜ˆ๋ฅผ ๋“ค์–ด ๋ฉ€๋ฆฌ์„œ ๋ถ€ํ„ฐ ๋ณด์ด๋Š” ํ‘œ์ง€ํŒ์€ ์•„์ฃผ ์ž‘๊ฒŒ ๋ณด์ด๊ณ , ๋„์‹ฌ์ด ์•„๋‹Œ ์‚ฐ์ด๋‚˜ ๋ฐ”๋‹ค ๊ทผ์ฒ˜์—์„œ๋งŒ ๋ณผ ์ˆ˜ ์žˆ๋Š” ํ‘œ์ง€ํŒ์€ ๊ฐœ์ˆ˜๊ฐ€ ๋งค์šฐ ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ์˜ ๋ถˆ๊ท ํ˜•์ด ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

 

 

์ด ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” conv6 ์ดํ›„๋กœ 3๊ฐœ์˜ branch(bbox, pixel, label)์„ ๊ฐ€์ง€๋Š” ํ‰๋ฒ”ํ•ด ๋ณด์ด๋Š” fully convolutional network ์ž…๋‹ˆ๋‹ค. ์ €์ž๋Š” ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๊ฐ€ ์ž‘์€ ๊ฐ์ฒด(ํ‘œ์ง€ํŒ)์„ ์ž˜ ๊ฐ์ง€ ๋ฐ ๋ถ„๋ฅ˜ํ•œ๋‹ค๊ณ  ์ฃผ์žฅํ•˜๋Š”๋ฐ ์‹คํ—˜์ ์œผ๋กœ ์ฆ๋ช…ํ•˜๊ธด ํ•˜์ง€๋งŒ ์ •ํ™•ํ•œ ๊ทผ๊ฑฐ๋ฅผ ์ œ์‹œํ•˜์ง€๋Š” ์•Š๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

 

 

์œ„์˜ Fast R-CNN ๊ณผ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ ๋น„๊ต๋ฅผ ๋ณด๋ฉด ๊ฐ์ฒด ์‚ฌ์ด์ฆˆ๊ฐ€ ์ž‘์€ ๊ฒฝ์šฐ์—๋Š” Fast R-CNN์˜ ๊ฒฝ์šฐ ์„ฑ๋Šฅ์ด ๋งค์šฐ ์•ˆ ์ข‹์ง€๋งŒ, ์ œ์•ˆํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋Š” ๋น„๊ต์  ๊ฐ์ฒด ์‚ฌ์ด์ฆˆ์— ๋ฌด๊ด€ํ•˜๊ฒŒ ์„ฑ๋Šฅ์ด ์ž˜ ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 


"Deep Learning for Large-Scale Traffic-Sign Detection and Recognition"

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” traffic sign detection ๋ฐ recognition์„ ์œ„ํ•ด mask R-CNN ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜๊ณ  traffic sign detection ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๋ช‡ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. 

 

์•ž์„œ ์„ค๋ช…ํ–ˆ๋“ฏ์ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋„ ์กฐ๋ช… ์กฐ๊ฑด, ์Šค์ผ€์ผ, ์นด๋ฉ”๋ผ ๊ฐ๋„, blur, occlusion ๋“ฑ์˜ ํ™˜๊ฒฝ์ ์ธ ์˜ํ–ฅ์—๋„ ์ž˜ ๋™์ž‘ํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ž‘์€ traffic sign์— ๋Œ€ํ•œ recall์„ ๋†’ํžˆ๋Š” adaptation๊ณผ traffic sign ์— ์ ํ•ฉํ•œ augmentation์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. 

 

*์šฉ์–ด 

TSD : Traffic Sign Detection

TSR : Traffic Sign Recognition

 

Mask R-CNN

Mask R-CNN์€ Faster R-CNN์˜ ํ™•์žฅ ๋ฒ„์ „์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ๊ณ , ๋‘ ๊ฐœ์˜ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ชจ๋“ˆ์€ RPN(Region Proposal Network)๋กœ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ bounding box๋ฅผ ์ƒ์„ฑํ•˜๋Š” CNN ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋ชจ๋“ˆ์€ proposed region์„ ๋ถ„๋ฅ˜ํ•˜๋Š” region-based CNN ๊ตฌ์กฐ(Fast R-CNN)์ž…๋‹ˆ๋‹ค. ์ „์ฒด ์•„ํ‚คํ…์ฒ˜๋Š” RPN๊ณผ Fast R-CNN ๋ชจ๋“ˆ์ด convolutional feature๋ฅผ ๊ณต์œ ํ•˜๋Š” ๋‹จ์ผ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, attention ๋ชจ๋“ˆ๊ณผ FPN(Feature Pyramid Network) ์ด ์ ์šฉ๋˜์–ด ์žˆ๊ณ , backbone network๋Š” ResNet์ž…๋‹ˆ๋‹ค. (Faster R-CNN์˜ backbone์€ VGG์ž…๋‹ˆ๋‹ค.)

 

Adaptation to Traffic-Sign Detection

์ผ๋ฐ˜์ ์ธ ๊ฐ์ฒด์˜ ๊ฐ์ง€ ๋ฐ ์ธ์‹์„ ์œ„ํ•ด ๊ฐœ๋ฐœ๋œ Mask R-CNN์„ TSD์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋ช‡๊ฐ€์ง€ ์˜์—ญ๋ณ„ ๊ฐœ์„ ์ ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

 

1) Online Hard-Example Mining

Online Hard-Example Mining (OHEM) ์„ classification learning module(Fast R-CNN module)์— ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ROI๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์„ classification loss ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Region๋“ค์„ loss value๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•˜๊ณ  loss ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๋†’์€ region๋งŒ classification module ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋„คํŠธ์›Œํฌ๊ฐ€ ๊ฐ€์žฅ ๋งŽ์ด ์‹ค์ˆ˜ํ•œ ์ƒ˜ํ”Œ, ์ฆ‰ ์–ด๋ ค์šด ์˜ˆ์ œ์— ๋Œ€ํ•œ ํ•™์Šต์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

 

2) Distribution of Selected Training Samples

Mask R-CNN์€ ROI๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒํ•˜๊ณ  ์ „๊ฒฝ๊ณผ ๋ฐฐ๊ฒฝ์— ๋”ฐ๋กœ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋ฏธ์ง€์— ๋งŽ์€ ํฌ๊ณ  ์ž‘์€ ๊ฐ์ฒด๋“ค์ด ๋™์‹œ์— ์กด์žฌํ•  ๋•Œ ๋ฌด์ž‘์œ„ ์„ ํƒ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋ฉด ํ•™์Šต ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋ถˆ๊ท ํ˜•ํ•ด์ง‘๋‹ˆ๋‹ค. ์‚ฌ์ด์ฆˆ๊ฐ€ ํฐ ๊ฐ์ฒด์—๋Š” ๋งŽ์€ ROI๊ฐ€ ์ƒ๊ธฐ๊ณ , ์ž‘์€ ๊ฐ์ฒด์—๋Š” ์ ์€ ROI๊ฐ€ ์ƒ๊ธฐ๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋ถ„ํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ˜ํ”Œ์„ ์„ ํƒํ•˜๋ฉด ํ•™์Šต ํ”„๋กœ์„ธ์Šค๊ฐ€ ์™œ๊ณก๋ฉ๋‹ˆ๋‹ค. ์‚ฌ์ด์ฆˆ๊ฐ€ ํฐ ๊ฐ์ฒด๊ฐ€ ๋” ๋งŽ์ด ๊ด€์ธก๋˜๊ณ  ์„ ํ˜ธ๋˜๊ธฐ(๋„คํŠธ์›Œํฌ ์ฐจ์›์—์„œ) ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. 

 

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด training sample ์˜ distribution์„ ๋ณ€๊ฒฝํ•˜์—ฌ ํ•™์Šต ๊ฐ์ฒด์˜ ๋ชจ๋“  ์‚ฌ์ด์ฆˆ๋ฅผ ๊ท ์ผํ•˜๊ฒŒ ํฌํ•จํ•˜๋„๋ก ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ด๋ฏธ์ง€์— ์žˆ๋Š” ๊ฐ ๊ฐ์ฒด์— ๋Œ€ํ•ด ๋™์ผํ•œ ์ˆ˜์˜ ROI๋ฅผ ์„ ํƒํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

 

3) Sample Weighting

Mask R-CNN์€ ํŠน์ • ๊ฒฝ์šฐ์— ๋ˆ„๋ฝ๋œ region proposal ๋กœ ์ธํ•ด 100% recall์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์—†๋Š”๋ฐ, training region์˜ weight๋ฅผ ๋‹ค๋ฅด๊ฒŒํ•˜์—ฌ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šตํ•˜๋Š” ๋™์•ˆ ์ „๊ฒฝ๊ณผ ๋ฐฐ๊ฒฝ ์˜์—ญ์ด ๋ชจ๋‘ ์„ ํƒ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋ฏธ์ง€์˜ ๋Œ€๋ถ€๋ถ„์˜ ๊ตํ†ต ํ‘œ์ง€ํŒ์€ ์ž‘๊ณ  ํ•ด๋‹น ํ‘œ์ง€ํŒ์— ๋Œ€ํ•ด ๋ช‡ ๊ฐœ์˜ region proposal ๋งŒ ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐฐ๊ฒฝ ๋ถ€๋ถ„์ด ๋งŽ์ด ์„ ํƒ๋ฉ๋‹ˆ๋‹ค. ๋•Œ๋ฌธ์— ๊ธฐ์กด ํ•™์Šต ๊ณผ์ •์—์„œ๋Š” ๋ฐฐ๊ฒฝ ๊ฐ์ฒด๋ฅผ ๋” ์ž์ฃผ ๊ด€์ฐฐํ•˜๊ณ  ๋ฐฐ๊ฒฝ ํ•™์Šต์— ์ง‘์ค‘ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐฐ๊ฒฝ ์˜์—ญ์— ๋” ์ž‘์€ weight๋ฅผ ์ค˜์„œ ์ „๊ฒฝ ๊ฐ์ฒด๋ฅผ ๋จผ์ € ํ•™์Šตํ•˜๋„๋ก ํ•˜์—ฌ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ๋ฐฐ๊ฒฝ ์˜์—ญ์— RPN์—๋Š” 0.1, classification network์—๋Š” 0.01์˜ weight๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„๋ฉ๋‹ˆ๋‹ค.  

 

4) Adjusting Region Pass-Through During Detection

๋งˆ์ง€๋ง‰์œผ๋กœ, detection stage์˜ RPN์—์„œ classification network๋กœ ์ „๋‹ฌ๋˜๋Š” ROI์˜ ์ˆ˜๋ฅผ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. ๊ตํ†ต ํ‘œ์ง€ ๋„๋ฉ”์ธ์—์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋งŽ์€ ์ˆ˜์˜ ์ž‘์€ ๊ฐ์ฒด๊ฐ€ ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ†ต๊ณผ๋˜๋Š” region์˜ ์ˆ˜๋ฅผ ์กฐ์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

NMS ์ด์ „์— ํ•˜๋‚˜์˜ FPN level ๋‹น region ์ˆ˜๋ฅผ 1000๊ฐœ์—์„œ 10000๊ฐœ๋กœ ๋Š˜๋ฆฌ๊ณ , ๋ชจ๋“  FPN level์˜ ROI๋ฅผ ๋ณ‘ํ•ฉํ•˜๊ณ  NMS 2000๊ฐœ์˜ region์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์€ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค.

 

Augmentation

 

Traffic sign ๋„๋ฉ”์ธ์˜ ํŠน์„ฑ์œผ๋กœ ์ธํ•ด ๊ธฐ์กด traffic sign ์ธ์Šคํ„ด์Šค์˜ ์ธ์œ„์ ์ธ distortion์„ ๊ฐ€ํ•ด ๋งŽ์€ ์ˆ˜์˜ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐ์ดํ„ฐ์…‹์˜ ๊ตํ†ต ํ‘œ์ง€ํŒ์€ pixel-wise ๋กœ annotate ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ํ•™์Šต ์ด๋ฏธ์ง€์—์„œ ๊ตํ†ต ํ‘œ์ง€ํŒ๋งŒ ๋ถ„๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ถ„๋ฆฌ๋œ ๊ตํ†ต ํ‘œ์ง€ํŒ์€ 1) geometric/shape distortion (perspective change, changes in scale) ๋ฐ 2) appearance distortion (variations in brightness and contrast) ๋‘ ๊ฐ€์ง€์˜ augmentation์ด ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. 

 

Geometric, appearance distortion์„ ์ ์šฉํ•˜๊ธฐ ์ „์— ๋จผ์ € ๊ฐ ๊ตํ†ต ํ‘œ์ง€ํŒ ์ธ์Šคํ„ด์Šค๋ฅผ normalize ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์—ฌ๋Ÿฌ ๊ฐ๋„์—์„œ ๋ฐ”๋ผ๋ณธ ๊ตํ†ต ํ‘œ์ง€ํŒ์„ projective transformation์„ ํ†ตํ•ด ์ •๋ฉด์—์„œ ๋ฐ”๋ผ๋ณธ ํ‘œ์ง€ํŒ์œผ๋กœ ๋ณ€๊ฒฝํ•˜๋Š” geometric normalize์™€ intensity channel ์˜ contrast๋ฅผ ์กฐ์ •ํ•˜๋Š” appearance normalize๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๊ฐ ํ‘œ์ง€ํŒ์„ ์ •๋ฉด์—์„œ ๋ฐ”๋ผ๋ณธ contrast๊ฐ€ ์ •๊ทœํ™”๋œ ์ธ์Šคํƒ„์Šค๋กœ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. 

 

* Original Image → Geometric/Apprearance Normalize → Synthetically generated distortions

 

์ดํ›„ ๊ฐ€๋Šฅํ•œ ์‚ฌ์‹ค์ ์ธ synthetic ํ•™์Šต ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์˜ geometry, appearance์˜ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ž์Šต๋‹ˆ๋‹ค. Geometry change ์˜ ๊ฒฝ์šฐ Euler rotation angle ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•˜๊ณ  appearance change์˜ ๊ฒฝ์šฐ ํ‰๊ท  intensity ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, geometry normalized ๋œ ์ธ์Šคํ„ด์Šค์˜ ํฌ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ scale ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. 

 

Synthetic distortion์„ ์ƒ์„ฑํ•  ๋•Œ ํ•ด๋‹น ๋ถ„ํฌ์—์„œ ๋ฌด์ž‘์œ„ ๊ฐ’์„ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ด€์ธก๋œ ๋ถ„ํฌ์˜ ๋ถ„์‚ฐ๋ณด๋‹ค ๋‘ ๋ฐฐ ํฐ ๋ถ„์‚ฐ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋” ํฐ distortion์„ ์ƒ์„ฑํ•˜๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ ์„ค์ •์„ emulate ํ•˜๊ธฐ ์œ„ํ•ด ์ƒˆ๋กœ ์ƒ์„ฑํ•œ ๊ตํ†ต ํ‘œ์ง€ํŒ ์ธ์Šคํ„ด์Šค๋ฅผ ๊ฑฐ๋ฆฌ ํ™˜๊ฒฝ๊ณผ ๊ฐ™์€ ๋ฐฐ๊ฒฝ ์ด๋ฏธ์ง€์— ํ‰์†Œ ๋„๋กœ๋งŒ ๋ณด์ด๋Š” ํ•˜๋‹จ ์ค‘์•™๋ถ€๋ฅผ ํ”ผํ•ด ๋ฌด์ž‘์œ„ ์œ„์น˜์— ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค. 

 

์ด๋Ÿฌํ•œ augmentation์„ ์ ์šฉํ•˜๋Š” ์ด์œ ๋Š” ๊ตํ†ต ํ‘œ์ง€ํŒ์€ ํ•œ ํด๋ž˜์Šค์˜ ๋ชจ์–‘์€ ๊ฑฐ์˜ ๋™์ผํ•˜์ง€๋งŒ, ๊ด€์ธก๋˜๋Š” ์œ„์น˜์— ๋”ฐ๋ผ perspective transformation๊ณผ ์กฐ๋ช… ๋“ฑ์˜ ํ™˜๊ฒฝ์— ๋”ฐ๋ผ geometric, appearance์˜ ์ฐจ์ด๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

 

*์ •๋ฆฌ

๊ฒฐ๋ก ์ ์œผ๋กœ Traffic sign detection ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด 1) OHEM์„ ์‚ฌ์šฉํ•˜์—ฌ ์–ด๋ ค์šด ์˜ˆ์ œ๋ฅผ ์ง‘์ค‘์ ์œผ๋กœ ํ•™์Šตํ•˜๊ณ , 2) ์ž‘์€ ์‚ฌ์ด์ฆˆ์˜ ROI๋„ ํฐ ์‚ฌ์ด์ฆˆ์˜ ROI์™€ ๋™์ผํ•œ ์ˆ˜์ค€์œผ๋กœ ์„ ํƒํ•˜๊ฒŒ ํ•˜๊ณ , 3) ๋ฐฐ๊ฒฝ ๋ณด๋‹ค ์ „๊ฒฝ์— ์ง‘์ค‘ํ•˜๋„๋ก ํ•˜๊ณ , 4) RPN์—์„œ ์ถ”์ถœํ•˜๋Š” ROI ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๊ณ , 5) Geomteric/Appearance distortion์„ ์ฃผ๋Š” augmentation์„ ์‚ฌ์šฉํ•˜๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค.

 

Experiments

์‹คํ—˜์€ Swedish traffic-sign dataset (STSD) ์™€ ์ƒˆ๋กœ ์ œ์•ˆํ•œ DFG traffic-sign dataset์œผ๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ Caffe2 ๊ธฐ๋ฐ˜์˜ Detectron์œผ๋กœ Faster R-CNN ๊ณผ Mask R-CNN์„ ๊ตฌํ˜„ํ•˜์—ฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Backbone์€ ResNet50์ž…๋‹ˆ๋‹ค. GPU ๋‹น batch 2๋ฅผ (์ด๋ฏธ์ง€ 2๊ฐœ)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ STSD์—์„œ๋Š” GPU 2๊ฐœ, DFG์—์„œ๋Š” GPU 4๊ฐœ๋กœ ์‹คํ—˜ํ•ฉ๋‹ˆ๋‹ค. (STSD : batch ๋‹น 4๊ฐœ ์ด๋ฏธ์ง€, DFG : batch ๋‹น 8๊ฐœ ์ด๋ฏธ์ง€)

 

๋ณธ ์‹คํ—˜์—์„œ๋Š” 1) PASCAL ์— ๊ธฐ๋ฐ˜ํ•œ mAP50์™€ 2) COCO ์— ๊ธฐ๋ฐ˜ํ•œ mAP50:95 ์„ metric ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. mAP50์€ ๊ณ ์ •๋œ IoU overlap์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๊ณ , mAP50:95๋Š” overlap ๋ฒ”์œ„ [0.50, 0.95]์— ๋Œ€ํ•ด 0.05 ์ฆ๋ถ„์œผ๋กœ ํ‰๊ท ์„ ๋‚ธ ๊ฐ’์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ COCO ๊ธฐ๋ฐ˜์˜ mAP50:95๋Š” region overlap์˜ ํ’ˆ์งˆ์— ๋” ์ค‘์ ์„ ๋‘๋Š” metric ์ž…๋‹ˆ๋‹ค.

 

False-positive rate(1 - Precision)๋Š” ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ํƒ์ง€๊ฐ€ ๊ฑฐ์ง“์ธ๊ฐ€๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ , miss rate(1 - Recall)๋Š” ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๊ตํ†ต ํ‘œ์ง€ํŒ์ด ํƒ์ง€ ๋˜์ง€ ์•Š์•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋น„์œจ์ž…๋‹ˆ๋‹ค.

 

* Precision์€ ํƒ์ง€ํ•œ ๊ฐ์ฒด ์ค‘ ์ •๋‹ต์„ ๋งž์ถ˜ ๊ฐ์ฒด์— ๋Œ€ํ•œ ์ง€ํ‘œ์ด๊ณ , Recall์€ ํƒ์ง€ํ•ด์•ผํ•˜๋Š” ๋ชจ๋“  ๊ฐ์ฒด ์ค‘ ์ •๋‹ต์„ ๋งž์ถ˜ ๊ฐ์ฒด์— ๋Œ€ํ•œ ์ง€ํ‘œ. ๋•Œ๋ฌธ์— ์ผ๋ฐ˜์ ์œผ๋กœ precision ๊ณผ recall์€ ๋ฐ˜๋น„๋ก€ ๊ด€๊ณ„.

 

 

 

 

 

๋ฐ˜์‘ํ˜•