[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Learning Transferable Visual Models From Natural Language Supervision / CLIP / Multi-modal network
ยท
๐Ÿ› Research/Multi-modal
Open AI์—์„œ ๊ฒŒ์žฌํ•œ(ICML2021) Contrastive Language-Image Pre-training(CLIP)๋ฅผ ์ œ์•ˆํ•œ ๋…ผ๋ฌธ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. Introduction & Motivation ๋”ฅ๋Ÿฌ๋‹์ด computer vision์˜ ๊ฑฐ์˜ ๋ชจ๋“  ๋ถ„์•ผ์—์„œ ๊ต‰์žฅํžˆ ์ž˜ ํ™œ์šฉ๋˜์ง€๋งŒ ํ˜„์žฌ ์ ‘๊ทผ ๋ฐฉ์‹์—๋Š” ๋ช‡๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ vision model๋“ค์€ ํ•™์Šต๋œ task์—๋Š” ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•˜์ง€๋งŒ ์ƒˆ๋กœ์šด task์— ์ ์šฉ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ƒˆ๋กœ ํ•™์Šต์„ ์‹œํ‚ค์•ผ ํ•˜๋Š”(๊ทธ๋Ÿฌ๋ฉด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์…‹๊ณผ ์ถ”๊ฐ€ ๋ ˆ์ด๋ธ”๋ง์ด ํ•„์š”..) ๋ฒˆ๊ฑฐ๋กœ์›€(?) ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฒค์น˜๋งˆํฌ์—์„œ ์ž˜ ์ˆ˜ํ–‰๋˜๋Š” ๋ช‡๋ช‡ model๋“ค์€ stress test์—์„œ ์ข‹์ง€ ์•Š์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€์•ˆ์œผ๋กœ raw text์™€ image๋ฅผ pair๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•..
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Bag of Tricks for Image Classification with Convolutional Neural Networks / ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ถ„์„ ๋…ผ๋ฌธ
ยท
๐Ÿ› Research/Image Classification
CVPR 2019 ์— ๊ณต๊ฐœ๋œ ๋…ผ๋ฌธ์œผ๋กœ, image classification ๋“ฑ์˜ vision ๋ถ„์•ผ์—์„œ ์ฐธ๊ณ ํ•˜๋ฉด ์ข‹์„ ์—ฌ๋Ÿฌ training ๋ฐฉ๋ฒ•๋ก ์„ ์ •๋ฆฌ ๋ฐ ์‹คํ—˜ํ•œ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. Introduction Image classification task์—์„œ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋” ์ข‹์€ ๋” ํฐ network ๋ฅผ ์“ฐ๋ฉด ๋˜์ง€๋งŒ, network๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ ์ด์™ธ์—๋„ ์„ฑ๋Šฅ์„ ์ขŒ์ง€์šฐ์ง€ํ•˜๋Š” ๋งŽ์€ ์š”์†Œ๋“ค์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ResNet50์„ ๊ธฐ์ค€์œผ๋กœ network architecture๋Š” ํฌ๊ฒŒ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๊ณ  ์—ฌ๋Ÿฌ Trick ๋“ค์„ ์‹คํ—˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ์—ฌ๋Ÿฌ trick๋“ค์„ ์ ์šฉํ•˜๋ฉด ์ ์šฉ ์ด์ „๋ณด๋‹ค ImageNet Top-1 accuracy๊ฐ€ 4% ๊ฐ€๋Ÿ‰์ด๋‚˜ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค(์œ„์˜ Table ..
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Deep Encoding Pooling Network (DEP), Texture-Encoded Angular Network (TEAN)
ยท
๐Ÿ› Research/Material & Texture Recognition
๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” Deep Texture Encoding Network(DeepTEN)์˜ ์—…๊ทธ๋ ˆ์ด๋“œ ๋ฒ„์ „์ธ Deep Encoding Pooling Network(DEP-Net)๊ณผ, DEP-Net๊ณผ Differential Angular Imaging Network(DAIN) ๊ตฌ์กฐ๋ฅผ ์œตํ•ฉํ•œ Texture-Encoded Angular Network(TEAN) ๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. A. Deep Texture Manifold for Ground Terrain Recognition / CVPR 2018 B. Differential Viewpoints for Ground Terrain Material Recognition / TPAMI 2020 A. Deep Texture Manifold for Ground Terrain..
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Deep TEN: Texture Encoding Network
ยท
๐Ÿ› Research/Material & Texture Recognition
CVPR 2017์— ๊ฒŒ์žฌ๋œ ๋ณธ ๋…ผ๋ฌธ์€ classicํ•œ computer vision approach์ธ dictionary learning ๋ฐฉ๋ฒ•์„ CNN ๊ตฌ์กฐ์™€ ํ†ตํ•ฉํ•˜์—ฌ end-to-end ๋กœ material, texture ์ด๋ฏธ์ง€์˜ orderless representation์„ ํ•™์Šตํ•˜๋Š” DeepTEN ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. Abstract ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” dictionary learning ๋ฐ encoding ํŒŒ์ดํ”„๋ผ์ธ์„ single model๋กœ ํฌํŒ…ํ•˜๋Š” encoding layer๊ฐ€ ์žˆ๋Š” Deep Texture Encoding Network(Deep TEN)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ method์—์„œ๋Š” SIFT descriptor ๋˜๋Š” material recognition์œผ๋กœ pre-trained CNN feature์™€ ..
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Deep Structure-Revealed Network for Texture Recognition
ยท
๐Ÿ› Research/Material & Texture Recognition
CVPR 2020์— ๊ฒŒ์žฌ๋œ texture recognition ๋ถ„์•ผ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. Texture์˜ ๊ณ ์œ ํ•œ ๊ตฌ์กฐ์ ์ธ ํŠน์ง•์„ ๋ถ„์„ํ•˜๊ณ  ์ด๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ texture recognition SOTA๋ฅผ ๋‹ฌ์„ฑํ•˜๊ณ  ablation, main ์‹คํ—˜ ์ด์™ธ์—๋„ fine-grained recognition, semantic segmenation ๊ณผ ๊ฐ™์€ ์‘์šฉ ์‹คํ—˜๊นŒ์ง€ ํฌํ•จ๋œ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. Abstract Texture recognition์€ ๋‹ค์–‘ํ•œ primitive์™€ arrangement ๊ฐ€ ๋™์ผํ•œ texture ์ด๋ฏธ์ง€์—์„œ ์ธ์‹๋  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์–ด๋ ค์šด task ์ž…๋‹ˆ๋‹ค. CNN์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์ตœ๊ทผ ์ž‘์—… ์ค‘ ์ผ๋ถ€๋Š” spatial arrangement์— invariant ํ•˜๋„๋ก orderless aggregati..
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Material Recognition from Local Appearance in Global Context
ยท
๐Ÿ› Research/Material & Texture Recognition
2016 arXiv์— ๊ฒŒ์žฌ๋œ ๋…ผ๋ฌธ์ด์ง€๋งŒ, material recognition์— context information ์„ explict ํ•˜๊ฒŒ ํ™œ์šฉํ•˜๋Š” ์—ฐ๊ตฌ์ด๊ธฐ์— ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. Motivation ์ขŒ์ธก ๊ทธ๋ฆผ์—์„œ ์ปต์˜ ํ‘œ๋ฉด๋งŒ ๋ณด๋ฉด ์ข…์ด์ธ์ง€ ํ”Œ๋ผ์Šคํ‹ฑ์ธ์ง€ ๊ธˆ์†์ธ์ง€ ์‚ฌ๋žŒ์ด ๋ด๋„ ์žฌ์งˆ ์œ ํ˜•์„ ํŒŒ์•…ํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, '์ปต' ์ด๋ผ๋Š” object ์ •๋ณด๋ฅผ ์–ป์œผ๋ฉด ์ปต + ์ƒ‰์ƒ์œผ๋กœ ๋ฏธ๋ฃจ์–ด ๋ณด์•„ ํ•ด๋‹น ํ‘œ๋ฉด์€ 'ํ”Œ๋ผ์Šคํ‹ฑ'์ด๋ผ๋Š” ๊ฒƒ์„ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๋“ฏ material์€ object, scene๊ณผ ๊ฐ™์€ context information๊ณผ ๊นŠ์€ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ์ธก ๊ทธ๋ฆผ์˜ ์œ„์ชฝ ํ‘œ๋ฅผ ๋ณด๋ฉด airplane์—๋Š” metal์ด ์ฃผ๋กœ ๊ด€์ธก๋˜๊ณ  sink์—๋Š” ceramic, metal ๋“ฑ์ด ์ฃผ๋กœ ๊ด€์ธก๋ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ์ชฝ ํ‘œ๋ฅผ ๋ณด..