๐Ÿ› Research/Image Classification

[๊ฐ„๋‹จ ์„ค๋ช…] ๊ธฐ๋ณธ์ ์ธ CNN ์•„ํ‚คํ…์ฒ˜ ์„ค๋ช… | VGGNet, ResNet, Densenet

๋ญ…์ฆค 2022. 2. 2. 15:32
๋ฐ˜์‘ํ˜•
  • VGGNet - Very Deep Convolutional Networks for Large-Scale Image Recognition / arXiv 2014
  • ResNet - Deep Residual Learning for Image Recognition / CVPR 2016
  • Densenet - Densely Connected Convolutional Networks / CVPR 2017

 

VGGNet

VGGNet์€ AlexNet๋ณด๋‹ค network์˜ layer๊ฐ€ 2๋ฐฐ์ด์ƒ ๊นŠ์–ด์ง€๋ฉฐ ๋”์šฑ ๋ณต์žกํ•œ task๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Network layer ๊ฐ€ ๊นŠ์–ด์ง€๊ณ  ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋  ์ˆ˜ ์žˆ์—ˆ๋˜ ์ด์œ ๋Š” VGGNet๋ถ€ํ„ฐ convolutional filter๋ฅผ 3x3 size๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋„คํŠธ์›Œํฌ๋ฅผ ๊นŠ๊ฒŒ ์Œ“๊ธฐ ์‹œ์ž‘ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

ํฐ ์‚ฌ์ด์ฆˆ์˜ conv. filter๋ฅผ ํ•˜๋‚˜ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์ž‘์€ ์‚ฌ์ด์ฆˆ์˜ filter๋ฅผ ์—ฌ๋Ÿฌ๊ฐœ ์‚ฌ์šฉํ•˜๋ฉด activation function์„ ๋” ๋งŽ์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์–ด์„œ, ๋„คํŠธ์›Œํฌ์˜ non-linearity(๋น„์„ ํ˜•์„ฑ)๋Š” ์ฆ๊ฐ€์‹œํ‚ค๊ณ , parameter ์ˆ˜๋Š” ๊ฐ์†Œ์‹œ์ผœ์ฃผ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. 

์ด์ „ ๋„คํŠธ์›Œํฌ์— ๋น„ํ•ด ์„ฑ๋Šฅ์ด ์ฆ๊ฐ€ํ•˜๊ธด ํ–ˆ์ง€๋งŒ, ๋งˆ์ง€๋ง‰ conv. layer์˜ output feature map์„ flatten ์‹œํ‚จ ํ›„ FC layer์— ๋„ฃ์–ด์ฃผ๋Š” ๋ฐฉ์‹์ด๊ธฐ ๋•Œ๋ฌธ์—, FC layer parameter ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๊ณ  intput image ์‚ฌ์ด์ฆˆ๊ฐ€ ๊ณ ์ •๋˜์–ด์•ผํ•œ๋‹ค๋Š” ๊ฒƒ์ด ๋งค์šฐ ํฐ ๋‹จ์ ์ด๊ณ , GAP๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— input ์ด๋ฏธ์ง€์˜ ๊ฐ ํ”ฝ์…€๋“ค์˜ spatial order์— ๋ฏผ๊ฐ์— ์งˆ ์ˆ˜ ์žˆ๋Š” ๋„คํŠธ์›Œํฌ ์ž…๋‹ˆ๋‹ค.

 

Network์˜ nonlinearity๋Š” conv filter ๋’ค์— ์—ฐ๊ฒฐ๋˜๋Š” non-linear function(e.g. ReLU, Sigmoid,...)์ด ๋งŽ์•„์งˆ ์ˆ˜๋ก ์ฆ๊ฐ€ํ•˜๋ฉฐ, nonlinearity๊ฐ€ ์ฆ๊ฐ€ํ•  ์ˆ˜๋ก ๋”์šฑ ์„ฌ์„ธํ•˜๊ฒŒ decision boundary ๋ฅผ ์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ task๊ฐ€ ๋ณต์žกํ•ด์งˆ ์ˆ˜๋ก non-linearity๊ฐ€ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

 

* 2022๋…„ ๊ธฐ์ค€์œผ๋กœ VGGNet์œผ๋กœ backbone์œผ๋กœ evaluation์„ ํ•˜๋Š” ์—ฐ๊ตฌ๋Š” ๊ฑฐ์˜ ์—†์Šต๋‹ˆ๋‹ค.

 

 

ResNet

VGGNet์ด layer๋ฅผ ๊นŠ๊ฒŒ ๋งŒ๋“ค๋ฉด์„œ ์„ฑ๋Šฅ์„ ์ฆ๊ฐ€์‹œ์ผฐ์ง€๋งŒ, layer๊ฐ€ ๋” ๋งŽ์ด ๊นŠ์–ด์งˆ ๋•Œ๋Š” ์„ฑ๋Šฅ ์ฆ๊ฐ€๊ฐ€ ๋ฏธ๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ด๋•Œ๊นŒ์ง€ ์‚ฌ์šฉ๋˜๋˜ deep neural network์˜ layer๊ฐ€ ๊นŠ์–ด์ง€๋ฉด gradient vanishing/exploding ๋“ฑ์˜ ์ด์œ ๋กœ training์ด ์ž˜ ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์œ„ ๊ทธ๋ฆผ์˜ ์™ผ์ชฝ ํ‘œ๋ฅผ ๋ณด๋ฉด plain-34 ๊ฐ€ ๋” ๊นŠ์€ network์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์˜คํžˆ๋ ค test error ๊ฐ€ ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

ResNet์€ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์œ„ ๊ทธ๋ฆผ์™€ ๊ฐ™์€ residual learning(skip connection์œผ๋กœ ๊ตฌํ˜„)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Residual learning์€ ์šฐ๋ฆฌ๋ง๋กœ๋Š” '์ž”์ฐจ ํ•™์Šต' ์œผ๋กœ ์–ด๋–ค ๊ฐ’์˜ ์ฐจ์ด(์ž”์ฐจ)๋ฅผ ํ•™์Šตํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๋„คํŠธ์›Œํฌ์—์„œ F(x) = H(x)๋ฅผ ํ•™์Šตํ•  ๋•Œ(F๋Š” ํ˜„์žฌ layer์˜ embedding function์ž…๋‹ˆ๋‹ค), residual learning์œผ๋กœ F(x) = H(x) - x (์ž”์ฐจ)๋ฅผ ํ•™์Šตํ•˜๋ ค ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ residual learning์„ ์œ„ํ•ด์„œ H(x) = F(x) + x ๊ฐ€ ๋˜์–ด์•ผ ํ•˜๋ฏ€๋กœ skip connection(shortcut)์œผ๋กœ weight layer ์ด์ „์˜ ์ž…๋ ฅ(x)์™€ weight layer๋ฅผ ํ†ต๊ณผํ•œ F(x)๋ฅผ ๋”ํ•ด์ค๋‹ˆ๋‹ค.

 

์ด๋Š” network๋ฅผ training ํ•  ๋•Œ optimalํ•œ H(x)๋ฅผ ์ฐพ๋Š” ๊ฒƒ์—์„œ optimalํ•œ F(x)=H(x)-x(์ถœ๋ ฅ๊ณผ ์ž…๋ ฅ์˜ ์ฐจ) ๋ฅผ ์ฐพ๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šตํ•˜๊ฒŒ ๋ณ€๊ฒฝ๋œ ๊ฒƒ์ด๋ฉฐ, ์ด๋Ÿฌํ•œ residual mapping์ด gradient vanishing/exploding์„ ๋ง‰์•„์ฃผ๊ณ  ๊ธฐ์กด์˜ CNN๋ณด๋‹ค optimize๊ฐ€ ์ž˜ ๋˜์–ด์„œ network์˜ layer๊ฐ€ ๊นŠ์–ด์ง€๋”๋ผ๋„ ํ•™์Šต์ด ์ž˜ ๋˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

 

์กฐ๊ธˆ ๋” ์‚ดํŽด๋ณด๋ฉด, F(x)=H(x)-x๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉด 0=H(x)-x๊ฐ€ ๋˜๊ณ (idealํ•œ ์ƒํ™ฉ) H(x)=x๊ฐ€ ๋˜๋ฏ€๋กœ, H(x)๋ฅผ x๋กœ mappingํ•˜๋Š” ๊ฒƒ์ด ํ•™์Šต์˜ ๋ชฉํ‘œ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ๊ฐ’์œผ๋กœ ์ตœ์ ํ™”ํ•ด์•ผํ• ์ง€ ๋ชจ๋ฅด๋Š” H(x) ๊ฐ€ H(x)=x ๋ผ๋Š” ์ตœ์ ํ™”์˜ ๋ชฉํ‘œ๊ฐ’์ด ์ œ๊ณต๋˜๊ธฐ ๋•Œ๋ฌธ์— identity mapping์ธ F(x)๊ฐ€ ํ•™์Šต์ด ๋” ์‰ฌ์›Œ์ง‘๋‹ˆ๋‹ค. 

 

๋˜ํ•œ, ๊ณฑ์…ˆ ์—ฐ์‚ฐ์—์„œ ๋ง์…ˆ ์—ฐ์‚ฐ์œผ๋กœ ๋ณ€ํ˜•๋˜๋Š”๋ฐ, ์•„๋ž˜์‹ (1), (2)๋Š” residual unit์„ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. 

F = residual function

f = activation function

h = identity mapping function(3x3 conv)

 

activation function f๋ฅผ identity mapping์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด x_(l+1)=y_l ์ด๋ฏ€๋กœ (3)๋ฒˆ์‹์ด ์„ฑ๋ฆฝ๋˜๊ณ  ์ด๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋Œ€์ž…ํ•˜๋ฉด (4)๋ฒˆ ์ˆ˜์‹์ด ๋งŒ๋“ค์–ด ์ง‘๋‹ˆ๋‹ค.

L์€ ์ „์ฒด layer์˜ ์ธ๋ฑ์Šค์ด๊ณ , l์€ layer ํ•˜๋‚˜ํ•˜๋‚˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. (4)๋ฒˆ ์‹์„ ๋ณด๋ฉด residual unit์„ ์‚ฌ์šฉํ•˜๋ฉด forward ์‹œ์— ์ „์ฒด ๋„คํŠธ์›Œํฌ ์—ฐ์‚ฐ์„ residual function์ธ F๋“ค์˜ ํ•ฉ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๊ณ , ์–‘๋ณ€์„ ๋ฏธ๋ถ„ํ•˜๋ฉด (5)๋ฒˆ ์‹์ด ๋ฉ๋‹ˆ๋‹ค. ๊ด„ํ˜ธ ์•ˆ์˜ ์šฐ๋ณ€์ด ๋ฐฐ์น˜๋งˆ๋‹ค ํ•ญ์ƒ -1์ด ๋˜๋Š” ๊ฒฝ์šฐ๋Š” ๊ฑฐ์˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— vanishing์ด ๋ฐœ์ƒํ•  ํ™•๋ฅ ์ด ๋งค์šฐ ์ ์–ด์ง‘๋‹ˆ๋‹ค.

 

๊ฒฐ๋ก ์ ์œผ๋กœ, residual learning์„ ์‚ฌ์šฉํ•˜๋ฉด shortcut์„ ํ†ตํ•ด ๋„คํŠธ์›Œํฌ ์—ฐ์‚ฐ์ด ๊ณฑ์…ˆ์ด ์•„๋‹Œ ๋ง์…ˆ๋“ค๋กœ ์ด๋ฃจ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์— ์ •๋ณด์˜ ์ „๋‹ฌ์ด ์‰ฝ๊ณ  weight ์ด ์—ฐ์†๋œ ๊ณฑ์…ˆ๋“ค๋กœ ์ „๋‹ฌ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์— vanishing์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋•Œ๋ฌธ์— residual learning์„ ์‚ฌ์šฉํ•˜๋ฉด ๋„คํŠธ์›Œํฌ๊ฐ€ ๊นŠ์ด์— ๋Œ€ํ•œ ํ•œ๊ณ„๋ฅผ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

๋˜ํ•œ, ResNet-50 ์ด์ƒ์˜ ๋ชจ๋ธ์—์„œ๋Š” parameter ์ˆ˜๊ฐ€ ์ ์  ๋งŽ์•„์ง€๋ฉด์„œ, bottleneck ๊ตฌ์กฐ๋ฅผ ์ฐจ์šฉํ•˜์—ฌ bottleneck residual block ์„ ์ค‘์ฒฉํ•˜์—ฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 

 

*Bottleneck residual : "1x1 conv(์ฑ„๋„ ๊ฐ์†Œ) → 3x3 conv → 1x1 conv(์ฑ„๋„ ์ฆ๊ฐ€)" ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ ์ฑ„๋„์„ ์ค„์ธ ์ƒํƒœ์—์„œ conv. ์—ฐ์‚ฐ์„ ํ•˜๊ณ  ๋‹ค์‹œ ์ฑ„๋„ ์ˆ˜๋ฅผ ์›๋ž˜๋Œ€๋กœ ๋ฐ”๊ฟ”์„œ short cut ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰

์œ„ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ plain network์— ๋‹จ์ˆœํžˆ skip connection ์„ ์ถ”๊ฐ€ํ•ด์ค€ ๊ฒƒ์ด residual learning ๋ฐฉ์‹์ด๋ฉฐ, residual learning ๋ฐฉ์‹์„ ์ ์šฉํ•˜๋ฉด layer๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹์•„์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

์œ„ ํ‘œ๋Š” ResNet18 ~ ResNet152 architecture์ด๋ฉฐ input ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ๊ฐ€ 224x224 ์ผ๋•Œ ๊ฐ layer output์˜ spatial size์™€ channel size ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์‚ฌ์ด์ฆˆ๋ณ„ network๋Š” pytorch์—์„œ ImageNet์œผ๋กœ pre-train ๋œ ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ์–ด์„œ ์—ฌ๋Ÿฌ task์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

ResNet์€ ๊ฑฐ์˜ ๋Œ€๋ถ€๋ถ„์˜ vision task(e.g. classification, segmentation, depth estimation, 3D vision, etc,...)์—์„œ backbone network๋กœ ํ™œ์šฉ๋˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ, ๊ตฌ์กฐ๋ฅผ ๋ฉด๋ฐ€ํžˆ ์ดํ•ดํ•ด๋ณด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ vanilla resnet ๋ณด๋‹ค ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” network์™€ learning method ๋“ค์ด ๋งŽ์ด ์žˆ์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ ๋ถ„์•ผ์—์„œ ์ƒˆ๋กœ์šด method์˜ ์„ฑ๋Šฅ ๋น„๊ต๋ฅผ ์œ„ํ•ด ๋™์ผํ•œ ์กฐ๊ฑด(๋™์ผ dataset, ๋™์ผ backbone ๋˜๋Š” ์œ ์‚ฌํ•œ parameter ์ˆ˜/๊ณ„์‚ฐ๋Ÿ‰์˜ network)์—์„œ evalutation์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

์˜ˆ๋ฅผ ๋“ค์–ด, object detection๊ณผ segmentation์— ์‚ฌ์šฉ๋˜๋Š” ์œ„์˜ FPN(Feature Pyramid Network) ๊ตฌ์กฐ์—์„œ ์ขŒ์ธก encoder output์ด ์šฐ์ธก decoder input์œผ๋กœ ์—ฐ๊ฒฐ๋˜๋Š”๋ฐ, encoder์˜ ๊ฐ output์€ resnet์˜ ๊ฐ residual block(conv2_x, conv3_x, conv4_x, conv5_x)์˜ output ์ž…๋‹ˆ๋‹ค.

์ด๋ ‡๋“ฏ backbone net์œผ๋กœ resnet์ด ๋งŽ์ด ์‚ฌ์šฉ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ํŠน์ • ์—ฐ๊ตฌ๋ถ„์•ผ์—์„œ ์ƒˆ๋กœ์šด network๋ฅผ ์ œ์•ˆํ•˜๋ ค ํ•  ๋•Œ, resnet์—์„œ ๋ฐœ์ „์‹œํ‚ค๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.

 

@ Pytorch ์ฝ”๋“œ ์˜ˆ์‹œ

๋”๋ณด๊ธฐ

- pytorch ์—์„œ ์ œ๊ณต๋˜๋Š” ImageNet pre-train๋œ resnet์„ ๋ถˆ๋Ÿฌ์™€์„œ, forward ํ•จ์ˆ˜์—์„œ ๊ฐ residual block์˜ output์„ ๋”ฐ๋กœ ์ €์žฅํ•˜์—ฌ ์—ฌ๋Ÿฌ vision task์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

class  Resnet_application(nn.Module):
    def __init__(self,nclass):
        super(Resnet_application, self).__init__()
        self.resnet = models.resnet50(pretrained=True)
        self.resnet.fc = nn.Linear(2048,nclass)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))

    def forward(self, x):
        conv_out = []

        x = self.resnet.conv1(x)
        x = self.resnet.bn1(x)
        x = self.resnet.relu(x)

        x = self.resnet.maxpool(x)
        x = self.resnet.layer1(x); conv_out.append(x);
        x = self.resnet.layer2(x); conv_out.append(x);
        x = self.resnet.layer3(x); conv_out.append(x);
        x = self.resnet.layer4(x); conv_out.append(x);
        x = self.avgpool(x)
        x = x.reshape(x.size(0),x.size(1))
        x = self.resnet.fc(x); conv_out.append(x);

        return conv_out


if __name__ == "__main__":
    model = Resnet_application(100).cuda()
    x = torch.rand((1,3,224,224)).cuda()
    y = model(x)
    print(y[0].shape,y[1].shape,y[2].shape,y[3].shape,y[4].shape)
torch.Size([1, 256, 56, 56]) torch.Size([1, 512, 28, 28]) torch.Size([1, 1024, 14, 14]) torch.Size([1, 2048, 7, 7]) torch.Size([1, 100])

 

 

DenseNet

Densenet์€ ๋ชจ๋“  layer์˜ feature map์„ concatenation ํ•˜๋Š” Dense Block์œผ๋กœ ๊ตฌ์„ฑ๋œ deep neual network ์ž…๋‹ˆ๋‹ค. Resnet๊ณผ ๋น„๊ตํ•˜๋ฉด, resnet์€ skip connection์—์„œ elemenet-wise ๋ง์…ˆ์„ ์ด์šฉํ•˜๊ณ  densenet ์€ channel ์ถ•์œผ๋กœ concat ํ•˜๋Š” ๋ฐฉ์‹์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

 

๋‹ค๋งŒ, ์ฑ„๋„์„ ๊ณ„์† concat ํ•˜๋ฉด ์ฑ„๋„ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์•„์ง€๊ธฐ ๋•Œ๋ฌธ์—, ๊ฐ layer feature map์˜ ์ฑ„๋„ ๊ฐœ์ˆ˜๋ฅผ ์ž‘์€ ์ˆ˜์˜ growth rate(k)๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Figure 1์—์„œ k=4(hyper param.) ์ธ ๊ฒฝ์šฐ์ด๋ฉฐ, ๊ทธ๋ฆผ ๊ธฐ์ค€์œผ๋กœ 6 channel feature input์ด dense block 4๊ฐœ๋ฅผ ํ†ต๊ณผํ•˜๋ฉด 6 + 4 + 4 + 4 + 4 =22 ๊ฐœ์˜ channel์„ ๊ฐ€์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. 

 

Bottleneck์˜ ๊ฒฝ์šฐ, resnet์€ "1x1 conv → 3x3 conv  → 1x1 conv" ์˜ ๊ตฌ์กฐ์ด์ง€๋งŒ, densenet์—์„œ๋Š” "1x1 conv → 3x3 conv" ๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ 1x1 conv ๋กœ ์ฑ„๋„ ๊ฐœ์ˆ˜๋ฅผ 4*k ๊ฐœ๋กœ ์ค„์ด๊ณ  3x3 conv๋กœ ๋‹ค์‹œ k๊ฐœ์˜ ์ฑ„๋„๋กœ ์ค„์ด๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 

 

Dense Block ์‚ฌ์ด ์‚ฌ์ด์—๋Š” Transition layer๊ฐ€ ์กด์žฌํ•˜๋Š”๋ฐ, ์ด๋Š” "1x1 conv + average pooling"์„ ์ˆ˜ํ–‰ํ•˜์—ฌ feature map์˜ spatial size์™€ chanel ์ˆ˜๋ฅผ ์ค„์ด๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ 1x1 conv ๋กœ channel ์ˆ˜๋ฅผ ์ค„์ด๋Š” ๋น„์œจ์€ theta=0.5 ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 

 

Densenet์˜ ์ „์ฒด์ ์ธ ๊ตฌ์กฐ๋Š” ์œ„์™€ ๊ฐ™์œผ๋ฉฐ, Dense Block + Transition layer ์˜ ๋ฐ˜๋ณต์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

์ •๋ฆฌํ•˜๋ฉด, Densenet์€ channel ์ถ• concatenation ๋ฐฉ์‹์œผ๋กœ skip connection์„ ํ•˜๋Š” dense block ๋ฅผ ๊นŠ๊ฒŒ ์Œ“์€ network๋กœ ๋‚˜๋จธ์ง€ ๊ตฌ์กฐ๋Š” feature map ์ด ๋„ˆ๋ฌด ์ปค์ง€์ง€ ์•Š๋„๋ก(ํšจ์œจ์ ์ธ ์—ฐ์‚ฐ์„ ์œ„ํ•ด) ์กฐ์ ˆํ•ด์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.


resnet, densenet ์€ ์—ฌ๋Ÿฌ vision task ์—ฐ๊ตฌ์—์„œ backbone network๋กœ ๋งŽ์ด ์‚ฌ์šฉ๋˜๊ณ , ํ•ด๋‹น ๋…ผ๋ฌธ๋“ค์€ image feature์— ๋Œ€ํ•œ ์ดํ•ด๋„๋ฅผ ๋†’์—ฌ์ฃผ๋Š” ์ข‹์€ ๋…ผ๋ฌธ์ด๊ธฐ์— ๊ผญ ์ฝ์–ด๋ณด๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค !

๋ฐ˜์‘ํ˜•