๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ› Research/Detection & Segmentation

[๊ฐ„๋‹จ ์„ค๋ช…] Object Detection ๊ธฐ์ดˆ : RCNN, SPPNet, Fast RCNN, Faster RCNN, YOLO | ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๊ฐ์ฒด ๊ฒ€์ถœ ๊ธฐ์ดˆ

by ๋ญ…์ฆค 2022. 4. 5.
๋ฐ˜์‘ํ˜•

1. RCNN

  1. Selective search ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๊ฒƒ ๊ฐ™์€ ์œ„์น˜์— box 2k๊ฐœ ์ถ”์ถœ
  2. Non-maximum Suppression ์œผ๋กœ ๊ฐ€์žฅ ์Šค์ฝ”์–ด๊ฐ€ ๋†’์€ box๋งŒ ๋‚จ๊น€(box๊ฐ€ ๊ฒน์น  ๋•Œ IoU >0.5 ์ด๋ฉด ์ ์šฉ)
  3. ๋ชจ๋“  box๋ฅผ 227x227 ๋กœ resize (๋น„์œจ ๊ณ ๋ ค x)
  4. Pre-train ๋œ ๋„คํŠธ์›Œํฌ์— box ์ด๋ฏธ์ง€๋ฅผ ํ†ต๊ณผ์‹œ์ผœ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ feature ์ถ”์ถœ
  5. SVM classifier ํ•™์Šต & Bounding box regression

๋‹จ์ 

- Region proposal ๋กœ ์ถ”์ถœํ•œ ์ˆ˜๋งŽ์€ ๊ฐœ์ˆ˜์˜ ์˜์—ญ์„ ๋ชจ๋‘ CNN์— ํ†ต๊ณผ์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋‹นํžˆ ์˜ค๋ž˜ ๊ฑธ๋ฆผ

- ๊ฐ์ฒด์˜ ๋น„์œจ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ๋ชจ๋‘ ๊ฐ™์€ ํฌ๊ธฐ๋กœ resize 

 

 

 

2. SPPNet

RCNN์˜ ๋‹จ์ ์ธ ๊ณ ์ •๋œ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ, ์ค‘๋ณต๋˜๋Š” CNN ๊ณ„์‚ฐ์„ ๊ฐœ์„ ํ•œ ๋„คํŠธ์›Œํฌ

  1. ์ „์ฒด ์ด๋ฏธ์ง€๋ฅผ pre-trainํ•œ ๋„คํŠธ์›Œํฌ์— ์ฃผ์ž…
  2. Selective search๋ฅผ ํ†ตํ•ด ์ฐพ์€ ํฌ๊ธฐ์™€ ๋น„์œจ์ด ๋‹ค๋ฅธ RoI์— Spatial Pyramid Pooling(SPP) ์ ์šฉํ•˜์—ฌ ๊ณ ์ •๋œ ์‚ฌ์ด์ฆˆ์˜ feature ์ถ”์ถœ
  3. FC layer ํ†ต๊ณผ
  4. SVM classifier ํ•™์Šต & Bounding box regression

๋‹จ์  

์—ฌ์ „ํžˆ end-to-end ๋ฐฉ์‹์ด ์•„๋‹Œ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๊ฐ€ ํ•„์š” (e.g - fine-tuning, SVM training, Bounding Box Regression)

 

 

 

3. Fast RCNN

CNN fine tuning, boundnig box regression, classification์„ ๋ชจ๋‘ ํ•˜๋‚˜์˜ ๋„คํŠธ์›Œํฌ์—์„œ ํ•™์Šต์‹œํ‚ค๋Š” end-to-end ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์•ˆ

  1. Pre-train๋œ ๋„คํŠธ์›Œํฌ์— ์ด๋ฏธ์ง€๋ฅผ ํ†ต๊ณผ์‹œ์ผœ feature map ์ถ”์ถœ
  2. Selective Search๋ฅผ ํ†ตํ•ด์„œ ์ฐพ์€ ๊ฐ๊ฐ์˜ RoI์— ๋Œ€ํ•˜์—ฌ RoI Poolingํ•˜์—ฌ ๊ณ ์ •๋œ ์‚ฌ์ด์ฆˆ์˜ feature ์ถ”์ถœ
  3. ์ถ”์ถœ๋œ feature vector๋Š” FC layer๋“ค์„ ํ†ต๊ณผํ•œ ๋’ค, softmax branch, bbox regressor branch๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค. 
  4. Softmax branch๋Š” softmax๋ฅผ ํ†ต๊ณผ์‹œ์ผœ ๊ฐ์ฒด์˜ class๋ฅผ ๋ถ„๋ฅ˜(SVM ์‚ฌ์šฉ x), bbox regressor branch๋Š” bounding box regression์„ ํ†ตํ•ด selective search๋กœ ์ฐพ์€ ๋ฐ•์Šค์˜ ์œ„์น˜ ์กฐ์ • 

๋‹จ์ 

์—ฌ์ „ํžˆ region proposal์„ selective search๋กœ ์ˆ˜ํ–‰(CPU ์—ฐ์‚ฐ๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ)

 

 

 

4. Faster RCNN

๊ธฐ์กด Fast RCNN ๋„คํŠธ์›Œํฌ์— selective search๋ฅผ Region Proposal Network (RPN) ์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ GPU๋ฅผ ํ†ตํ•œ RoI ๊ณ„์‚ฐ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

RPN ๊ฐ„๋žต ์„ค๋ช…

์ถ”์ถœํ•œ feature map์—์„œ anchor box ์•ˆ์— object ๊ฐ€ ์žˆ๋Š”์ง€์— ์˜ˆ์ธกํ•˜๋Š” branch์™€ Bounding box ๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ branch๋กœ ๋‚˜๋‰˜์–ด ๊ณ„์‚ฐํ•˜๊ณ  ์–ป์–ด์ง„ ๊ฐ’๋“ค๋กœ RoI๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ํ›„ ๊ฐ์ฒด์ผ ํ™•๋ฅ ์ด ๋†’์€ K๊ฐœ์˜ anchor๋ฅผ ์ถ”๋ ค๋‚ด๊ณ  non-maximum-supression์œผ๋กœ ์ตœ์ข… RoI๋ฅผ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.

๋‹จ์ 

์ „์ฒด ๋ชจ๋ธ์„ ํ•œ๋ฒˆ์— ํ•™์Šต์‹œํ‚ค๊ธฐ ์–ด๋ ค์›Œ์„œ(์ดˆ๊ธฐ์— RPN์ด RoI๋ฅผ ๊ณ„์‚ฐํ•˜์ง€ ๋ชปํ•˜๋‹ˆ๊นŒ) ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์— ๊ฑธ์ณ ๋ชจ๋ธ์„ ๋ฒˆ๊ฐˆ์•„ ํ•™์Šต์‹œํ‚ค๋Š” Alternating training ๋ฐฉ๋ฒ• ์‚ฌ์šฉ

 

 

 

4.5. Mask RCNN

Instance Segmentation์— ์‚ฌ์šฉ๋˜๋Š” method๋กœ Faster R-CNN์˜ RPN์—์„œ ์–ป์€ RoI(Region of Interest)์— ๋Œ€ํ•˜์—ฌ ๊ฐ์ฒด์˜ class๋ฅผ ์˜ˆ์ธกํ•˜๋Š” classification branch, bbox regression์„ ์ˆ˜ํ–‰ํ•˜๋Š” bbox regression branch์™€ ํ‰ํ–‰์œผ๋กœ segmentation mask๋ฅผ ์˜ˆ์ธกํ•˜๋Š” mask branch๋ฅผ ์ถ”๊ฐ€ํ•œ ๊ตฌ์กฐ. mask branch๋Š” ๊ฐ๊ฐ์˜ RoI์— ์ž‘์€ ํฌ๊ธฐ์˜ FCN(Fully Convolutional Network)๊ฐ€ ์ถ”๊ฐ€๋œ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค. segmentation task๋ฅผ ๋ณด๋‹ค ํšจ๊ณผ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ์ฒด์˜ spatial location์„ ๋ณด์กดํ•˜๋Š” RoIAlign layer๋ฅผ ์ถ”๊ฐ€. 

 

 

 

5. YOLO

1 Stage Object Detection ์ œ์‹œ

Region Proposal ๊ณผ Classification ์„ ํ•œ๋ฒˆ์— ์ˆ˜ํ–‰. ์ตœ์ข… output feature map์€ bounding box ์œ„์น˜์™€ ํฌ๊ธฐ, box ๋‚ด๋ถ€์— ํด๋ž˜์Šค๊ฐ€ ์žˆ์„ ํ™•๋ฅ (์‹ ๋ขฐ๋„), ํŠน์ • ํด๋ž˜์Šค์ผ ํ™•๋ฅ  ๊ฐ’ ๋“ค์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

*์ •๋ฆฌ

RCNN → SPPNet → Fast RCNN → Faster RCNN → YOLO ์ˆœ์œผ๋กœ object detection ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜๋ฉด์„œ ์•„๋ž˜์™€ ๊ฐ™์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์ด ๊ฐœ์„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

 

  • ๊ณ ์ •๋œ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ, ๋น„์œจ → ์ž„์˜์˜ ์‚ฌ์ด์ฆˆ, ๋น„์œจ์˜ box ๋กœ classification ๊ฐ€๋Šฅ
  • ์ค‘๋ณต๋˜๋Š” ์˜์—ญ์„ CNN์œผ๋กœ ์—ฌ๋Ÿฌ๋ฒˆ ๊ณ„์‚ฐ → ์ „์ฒด ์ด๋ฏธ์ง€๋Š” ํ•œ ๋ฒˆ๋งŒ ๋„คํŠธ์›Œํฌ ํ†ต๊ณผ
  • end-to-end ๋ฐฉ์‹์ด ์•„๋‹˜ →  end-to-end ๋ฐฉ์‹ 
  • Selective search(CPU๋งŒ ์‚ฌ์šฉ) → RPN ์‚ฌ์šฉ(GPU)
  • ๋Š๋ฆฐ ์†๋„ → Region proposal & Classification ์„ ํ•œ ๋ฒˆ์— ์ˆ˜ํ–‰ํ•˜๋Š” 1 stage object detection ์ œ์•ˆ

 

๋ฐ˜์‘ํ˜•