๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
728x90

๐Ÿ› Research58

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows / ๋ฐœ์ „๋œ ํ˜•ํƒœ์˜ ViT NLP ๋ถ„์•ผ์—์„œ ์ด์Šˆ๊ฐ€ ๋˜์—ˆ๋˜ transformer('Attention Is All You Need/NIPS2017')๊ตฌ์กฐ๋ฅผ vision task์— ์ ‘๋ชฉํ•œ Vision Transformer(ViT)์™€ ViT์—์„œ ๊ฐœ์„ ๋œ ๊ตฌ์กฐ์ธ Swin Transformer์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. * ๋…ผ๋ฌธ A. AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE / ICLR2021 B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows / ICCV2021 1. Vision Transformer (ViT) Computer vision ๋ถ„์•ผ์—์„œ ๊ธฐ์กด์˜ self attent.. 2022. 1. 8.
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Non-local Neural Networks / Vision Transformer์˜ ์‹œ์ดˆ Non-local network ์ •๋ฆฌ... CNN ์€ ์–•์€ layer์—์„œ๋Š” spatial domain์—์„œ์˜ localํ•œ ์˜์—ญ์˜ correlation์„, ๊นŠ์€ layer์—์„œ๋Š” ์ƒ๋Œ€์ ์œผ๋กœ globalํ•œ ์˜์—ญ๊นŒ์ง€์˜ correlation์„ ์ถ”์ถœํ•˜๋Š” local operator ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ layer๊ฐ€ ๊นŠ์–ด์ง€๋”๋ผ๋„ ํ•œ๋ฒˆ์˜ ์—ฐ์‚ฐ์—์„œ ์ „์ฒด ์˜์—ญ์˜ correlation์„ ์ถ”์ถœํ•˜๋Š” non-local ์—ฐ์‚ฐ๊ณผ๋Š” ์ฐจ์ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋•Œ๋ฌธ์— CNN์€ spatial domain ๋˜๋Š” temporal domain ์ƒ์—์„œ ๊ฑฐ๋ฆฌ๊ฐ€ ๋จผ feature ๋“ค๊ฐ„์˜ correlation์ด ์ถ”์ถœ๋˜๊ธฐ ํž˜๋“  ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์ด๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ Non-local operation์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ non-local block.. 2021. 12. 12.
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Pyramid Scene Parsing Network / PSPNet / Pyramid Pooling ๋ณธ ๋…ผ๋ฌธ์€ CVPR2017์— ๊ฒŒ์žฌ๋˜์—ˆ์œผ๋ฉฐ PSPNet(ImageNet scene parsing challenge 2016์—์„œ 1๋“ฑ)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ๋Š” ๋” ์„ฑ๋Šฅ์ด ์ข‹์€ ์—ฐ๊ตฌ๊ฐ€ ๋งŽ์ด ์†Œ๊ฐœ๋˜์—ˆ์ง€๋งŒ semantic segmentation์— global contextual information์„ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•œ Pyramid Pooling Module ์„ ์ •๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๋ฆฌ๋ทฐ๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. Motivation ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ segmentation ์•Œ๊ณ ๋ฆฌ์ฆ˜์— 3๊ฐ€์ง€ ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค๊ณ  ์ง€์ ํ•ฉ๋‹ˆ๋‹ค. (์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” FCN ๊ณผ ๋น„๊ต) 1) Mismatched Relationship : ์ฃผ๋ณ€ ํ™˜๊ฒฝ(contextual information)๊ณผ ๋งž์ง€ ์•Š๋Š” ํ”ฝ์…€ ๋ถ„๋ฅ˜. ์˜ˆ๋ฅผ ๋“ค์–ด ํ˜ธ์ˆ˜ ๊ทผ์ฒ˜์˜ ์ž๋™์ฐจ, ๋„๋กœ ์œ„์˜ ๋ณดํŠธ ๊ฐ™์€.. 2021. 12. 5.
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Unified Perceptual Parsing for Scene Understanding / UperNet / Multi-task learning ๋ณธ ๋…ผ๋ฌธ์€ ECCV 2018์— ๊ฒŒ์žฌ๋œ ๋…ผ๋ฌธ์œผ๋กœ ๋‹ค์–‘ํ•œ visual concepts ์ธ์‹ํ•˜๋Š”(multi-task learning) Unified Perceptual Parsing ์ด๋ผ๋Š” ์ƒˆ๋กœ์šด task ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. Introduction ์œ„ ๊ทธ๋ฆผ์€ ๊ฑฐ์‹ค(scene)์— ํ…Œ์ด๋ธ”, ๊ทธ๋ฆผ, ๋ฒฝ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๊ฐ์ฒด(object)๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ๊ณ  ๋™์‹œ์— ํ…Œ์ด๋ธ”์€ ํ…Œ์ด๋ธ” ๋‹ค๋ฆฌ, ์ƒํŒ, apron(part) ๋“ฑ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ํ…Œ์ด๋ธ”์€ ๋‚˜๋ฌด(material)๋กœ ๋งŒ๋“ค์–ด์กŒ๊ณ  ์†ŒํŒŒ ํ‘œ๋ฉด์€ kinitted(texture) ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์นดํ…Œ๊ณ ๋ฆฌ๋“ค์€ scene understanding, object/material/part/texture recognition task์—์„œ ๊ฐ๊ฐ ๋…๋ฆฝ์ ์œผ๋กœ ์ˆ˜ํ–‰๋˜์–ด ์™”์Šต๋‹ˆ๋‹ค... 2021. 12. 4.
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] SHAPE-TEXTURE DEBIASED NEURAL NETWORK TRAINING / ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์—์„œ shape๊ณผ texture์˜ ๊ด€๊ณ„ ICLR 2021์— ๊ฐœ์ œ๋œ ๋…ผ๋ฌธ์ด๋ฉฐ object์™€ shape, texture์™€์˜ ๊ด€๊ณ„, ๊ทธ๋ฆฌ๊ณ  object recognition ๋“ฑ์˜ vision task์—์„œ shape๊ณผ texture ์ •๋ณด๋ฅผ ๋ชจ๋‘ ์ด์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ shape-texture debiased neural network๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. Introduction Shape๊ณผ texture๋Š” ๋ชจ๋‘ object๋ฅผ ์ธ์‹ํ•  ๋•Œ ์ค‘์š”ํ•œ ๋‹จ์„œ๋“ค์ž…๋‹ˆ๋‹ค. ์ด๋ฏธ ์ด์ „์˜ object recognition ์—ฐ๊ตฌ์—์„œ shape๊ณผ texture๋ฅผ ์ ์ ˆํ•˜๊ฒŒ ๊ฒฐํ•ฉํ•˜๋ฉด ์ธ์‹ ์„ฑ๋Šฅ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Œ์ด ๋ฐํ˜€์กŒ์Šต๋‹ˆ๋‹ค. ‘IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES A.. 2021. 12. 4.
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Learning to Compare: Relation Network for Few-Shot Learning / meta-learning, few shot learning ๋ณธ ๋…ผ๋ฌธ์€ CVPR2018์— ๊ฒŒ์žฌ๋œ few shot learning ์ด๋ผ๋Š” ์ฃผ์ œ์˜ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹์—์„œ ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๋Š” ์„ฑ๋Šฅ๊ณผ ์ง๊ฒฐ๋˜์ง€๋งŒ, ํ˜„์‹ค์ ์ธ ํ…Œ์Šคํฌ์—์„œ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๋Š” ๋Š˜ ๋ถ€์กฑํ•  ์ˆ˜ ๋ฐ–์— ์—†์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ limited data ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด data ์ฐจ์›์—์„œ๋Š” data augmentation ๋ฐฉ๋ฒ•์ด ์กด์žฌํ•˜๊ณ , network ์ฐจ์›์—์„œ๋Š” Un/Semi-supervised learning, Transfer learning, Meta learning ๋ฐฉ๋ฒ• ๋“ฑ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. few shot learning์€ meta learning ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์ ์€ data ๊ฐœ์ˆ˜๋กœ network๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•๋ก ์ž…๋‹ˆ๋‹ค. Meta learning์—๋Š” metric, model, optimization, GCN .. 2021. 10. 17.
728x90