๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ› Research/Material & Texture Recognition

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Differential Angular Imaging for Material Recognition

by ๋ญ…์ฆค 2021. 10. 11.
๋ฐ˜์‘ํ˜•

CVPR2017์— ๊ฒŒ์žฌ๋œ material recognition ๊ด€๋ จ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฏธ์ง€ ์บก์ฒ˜์˜ small anugular variation๋ฅผ ์ด์šฉํ•˜์—ฌ ์žฌ์งˆ ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” DAIN(Differential Angular Imaging Network)๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

Introduction

Real world scene์€ ๋‚˜๋ฌด, ๋Œ€๋ฆฌ์„, ํ™, ๊ธˆ์†, ์„ธ๋ผ๋ฏน ๋“ฑ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์žฌ์งˆ๋กœ ๋งŒ๋“ค์–ด์ง„ ํ‘œ๋ฉด์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ณ , ์ด๋Š” ์ด๋ฏธ์ง€์—์„œ ํ’๋ถ€ํ•œ visual variation์„ ๋ฐœ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. Material recognition์€ autonomous agent, human-machine ์‹œ์Šคํ…œ๊ณผ ๊ฐ™์€ ์‘์šฉ ๋ถ„์•ผ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์žฌ์งˆ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ์ตœ๊ทผ ๋ช‡ ๋…„ ๋™์•ˆ ํ™œ๋ฐœํ•œ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Material appearance modeling์˜ ์ดˆ๊ธฐ ์—ฐ๊ตฌ๋Š” ์ •๋ฐ€ํ•œ ์ธก์ •์„ ์š”๊ตฌํ•˜๋Š” BRDF, BTF ๋“ฑ์„ ์‚ฌ์šฉํ•˜๋Š” ์‹คํ—˜์‹ค ๊ธฐ๋ฐ˜์˜ ๋ฐ˜์‚ฌ์œจ ์ธก์ •์— ์ง‘์ค‘ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ reflectance ๊ธฐ๋ฐ˜์˜ ์—ฐ๊ตฌ๋Š” surface์˜ ๊ณ ์œ ํ•œ invariantํ•œ ํŠน์„ฑ์„ ๊ด€์ธกํ•˜์—ฌ fine-grainedํ•œ ์žฌ์งˆ ์ธ์‹์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‹คํ—˜์‹ค ๊ธฐ๋ฐ˜์˜ ์ด๋ฏธ์ง€ ์บก์ฒ˜๋Š” ์ œํ•œ์กฐ๊ฑด์ด ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ์•ผ์™ธ์—์„œ๋Š” ๋„๋ฆฌ ์‚ฌ์šฉ๋˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ ์žฌ์งˆ ์ธ์‹์˜ ์ตœ๊ทผ ์—ฐ๊ตฌ๋Š” single-view ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ classifier๋ฅผ ํ•™์Šต์‹œํ‚ค๊ณ  multi-view reflectance ์ •๋ณด๊ฐ€ ์—†๋Š” arbitrary image์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ณ ์œ ํ•œ ์žฌ์งˆ์˜ appearance ์ •๋ณด๋ณด๋‹ค๋Š” context ์ •๋ณด์— ๋”์šฑ ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” in-scene์˜ appearance๋ฅผ captureํ•˜์ง€๋งŒ control ๋œ viewpoint angle๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. (๋กœ๋ด‡์ด ์›€์ง์ด๋ฉด์„œ ํ•œ scene์—์„œ ํŠน์ •๊ฐ๋„๋กœ ๊ฐ์„ ๋ฐ”๊ฟ” ๊ฐ€๋ฉฐ ์ฐ์€ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•œ๋‹จ ๋ง). ์ด๋Ÿฌํ•œ ์ธก์ •์€ reflectance function์˜ sampling์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” multiple viewing angles์ด ์žฌ์งˆ ์ธ์‹์— ์–ด๋–ป๊ฒŒ ๋„์›€์ด ๋ ์ง€๋ผ๋Š” ์งˆ๋ฌธ๊ณผ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ์ด์ „ ์—ฐ๊ตฌ์—์„œ๋Š” shape reconsturction์„ ์œ„ํ•ด differential camera motion ๋˜๋Š” object motion์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๋Š” ์ƒˆ๋กœ์šด ์งˆ๋ฌธ์„ ๊ณ ๋ คํ•ฉ๋‹ˆ๋‹ค. viewing angle์˜ ์ž‘์€ ๋ณ€ํ™”๊ฐ€ ์ธ์‹ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋Š”์ง€? ์ด์ „ ์žฌ์งˆ ์ธ์‹ ์—ฐ๊ตฌ์—์„œ angular filtering ์˜ ํŒŒ์›Œ๋ฅผ ๋ณด์—ฌ์คฌ์ง€๋งŒ ์ด๋“ค์€ mirror ๊ธฐ๋ฐ˜ ์นด๋ฉ”๋ผ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ BRDF์˜ ์Šฌ๋ผ์ด์Šค๋ฅผ ์บก์ณํ•˜๊ฑฐ๋‚˜ light-field ์นด๋ฉ”๋ผ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ differential viewpoint  variation์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ผ๋ฐ˜ ์นด๋ฉ”๋ผ๋กœ ์‹œ์•ผ๊ฐ์˜ ์ฐจ์ด๊ฐ€ ๋ณ€ํ™”ํ•˜๋Š” ํ‘œ๋ฉด์„ ์บก์ฒ˜ํ•˜๊ณ  angular gradient์˜ ๊ทผ์‚ฌ์น˜๋ฅผ ๊ณ„์‚ฐํ•  ๊ฒƒ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํŠน์ • ์‹œ์•ผ๊ฐ v, differential viewpint v + δ ์— ๋Œ€ํ•œ ์ด๋ฏธ์ง€ ์บก์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” angular differential iamging ์ด๋ผ๋Š” ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.  ์ด ๋ฐฉ๋ฒ•์€ ์ด์ „์— 22.5๋„์™€ ๊ฐ™์€ ํฐ ๊ฐ๋„ ๊ฐ„๊ฒฉ์„ ๊ฐ€์ง„ ์‹คํ—˜์‹ค ๊ธฐ๋ฐ˜์˜ ๋ฐ˜์‚ฌ์œจ ์ธก์ • ๋ฐฉ๋ฒ•๊ณผ ๋Œ€์กฐ๋ฉ๋‹ˆ๋‹ค. ์†Œํ˜• ์Šคํ…Œ๋ ˆ์˜ค ์นด๋ฉ”๋ผ ๋˜๋Š” ์›€์ง์ด๋Š” ์นด๋ฉ”๋ผ๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” differential angular imaging์ด ์žฌ์งˆ reflectance ์†์„ฑ์— ๋Œ€ํ•œ ํ•ต์‹ฌ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

 

- Dataset 

GTOS(Ground Terrain in Outdoor Scenes) dataset์„ ์‚ฌ์šฉ. GTOS๋Š” multiple viewpoint, illumination conditions ๊ทธ๋ฆฌ๊ณ  angular different imaging์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” 40๊ฐœ์˜ ground terrain ์žฌ์งˆ dataset์ž…๋‹ˆ๋‹ค.

 

multi-view ์ด๋ฏธ์ง€๋Š” ์œ„ ๊ทธ๋ฆผ์˜ ๊ฒ€์€์ƒ‰ ์„ ,๋ฐ•์Šค์— ํ•ด๋‹นํ•˜๋Š” 10๋„ ์ฐจ์ด๋ฅผ ๊ฐ€์ง€๋Š” 9๊ฐœ์˜ multi-view๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๊ณ , differential angle variation์€ ๋ชจ๋“  view๋งˆ๋‹ค(9๊ฐœ) 5๋„ ์ฐจ์ด๋ฅผ ๊ฐ€์ง€๋Š” view 1๊ฐœ์”ฉ(๋…น์ƒ‰ ์„ , ๋ฐ•์Šค) ์ถ”๊ฐ€๋กœ ๊ตฌ์„ฑ๋˜์–ด ํ•˜๋‚˜์˜ smaple๋‹น ์ด 18๊ฐœ์˜ viewing direction์„ ๊ฐ€์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

Differential Angular Imaging

๊ฐ„๋‹จํžˆ ์„ค๋ช…ํ•˜๋ฉด intensity ∂Iv/∂v angular gradient๋ฅผ ๊ทผ์‚ฌํ•˜์—ฌ ์ž‘์€ δ๋ฅผ ๊ฐ€์ง€๋Š” I(v + δ) − I(v)๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.  

์œ„ ์ด๋ฏธ์ง€๋“ค(2ํ–‰)์€ differential angular image๋“ค์˜ ์˜ˆ์‹œ์ด๋ฉฐ ์ด ์ด๋ฏธ์ง€๋“ค์€ reflectance ์™€ 3D relief texture์˜ angular gradient ์ •๋ณด๋ฅผ ๋‚ดํฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

5๋„์˜ viewpoint ์ฐจ์ด๋กœ ๋งŒ๋“ค์–ด์ง„ differential ์ด๋ฏธ์ง€๋“ค์ด๋ฉฐ, ์œก์•ˆ์œผ๋กœ ๋ด๋„ ๋ฐ˜์‚ฌ์ •๋„์™€ ํ‘œ๋ฉด ๊ฑฐ์น ๊ธฐ ๋“ฑ์ด color ์ด๋ฏธ์ง€๋ณด๋‹ค ์ž˜ ๋ณด์ด๋Š” ์ ์œผ๋กœ ๋ณด์•„ ํ•ด๋‹น ์ด๋ฏธ์ง€๋“ค๋กœ ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ์„ ๋†’์ผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ญ๋‹ˆ๋‹ค.

์ด์ œ ์ด๋Ÿฌํ•œ color, differential angular image ๋“ค์„ DAIN์˜ input์œผ๋กœ  ์‚ฌ์šฉํ•˜์—ฌ ์žฌ์งˆ์„ ๋ถ„๋ฅ˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

DAIN

color, diff ์ด๋ฏธ์ง€๋ฅผ ์ธ์ฝ”๋”ฉํ•œ ์ •๋ณด๋ฅผ ํ•ฉ์ณ ์žฌ์งˆ์„ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด 3๊ฐ€์ง€ ๋„คํŠธ์›Œํฌ๋กœ ์‹คํ—˜์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

 

1) Final layer combination

color, diff ์ด๋ฏธ์ง€๋ฅผ ๊ฐ๊ฐ ์„œ๋กœ ๋‹ค๋ฅธ CNN์— ๋„ฃ์–ด ๊ฐ CNN์ด color, diff ์ด๋ฏธ์ง€๋ฅผ ์ธ์ฝ”๋”ฉํ•˜๊ธฐ ์œ„ํ•ด ์ ์ ˆํžˆ ํ•™์Šต๋  ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ตœ์ข… output๋งŒ์„ ํ•ฉ์ณ class๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋ฏ€๋กœ low-level feature๋“ค์ด ๋ฌด์‹œ๋  ์ˆ˜ ์žˆ๋Š” ๋‹จ์ ์ด ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

2) Intermediate combination

color, diff ๊ฐ๊ฐ์˜ lower layer์—์„œ ์ถœ๋ ฅ๋œ feature๋ฅผ ํ•ฉ์ณ ํ•˜๋‚˜์˜ higher layer์— ๋„ฃ์–ด ํด๋ž˜์Šค๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋ฏ€๋กœ, color, diff ์ •๋ณด๊ฐ€ ์ค‘๊ฐ„์— smoothing๋˜์–ด๋ฒ„๋ฆฌ๋Š” ๋‹จ์ ์ด ๋ฐœ์ƒํ•  ๋“ฏ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ higher layer ๋ฅผ ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๊ฐ์†Œํ•˜๋ฏ€๋กœ ๊ทธ ์ด์œ ๋กœ ์ธํ•œ ์„ฑ๋Šฅ๊ฐ์†Œ ๋˜ํ•œ ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

3) DAIN

1,2๋ฒˆ ๋„คํŠธ์›Œํฌ์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด low-level feature๋ฅผ ํ•ฉ์น˜์ง€๋งŒ color ์ด๋ฏธ์ง€์˜ feature๋Š” ๊ทธ๋Œ€๋กœ higher layer๋ฅผ ํ†ต๊ณผ์‹œ์ผœ diff. ์ด๋ฏธ์ง€์™€ ์ •๋ณด๊ฐ€ ์„ž์ด์ง€ ์•Š๋„๋ก ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ์‹œํ‚ต๋‹ˆ๋‹ค.

 

- Multiview DAIN

 

singleview ๋Š” 0.5๋„ ์ฐจ์ด์˜ small angular ์— ํ•ด๋‹นํ•˜๋Š” reflectance ๋งŒ์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด์— multiview ์ด๋ฏธ์ง€๋“ค์„ ์‚ฌ์šฉํ•˜๋ฉด ๋” ๋„“์€ ๊ฐ๋„์˜ reflectance ์ •๋ณด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์žฌ์งˆ ๋ถ„๋ฅ˜์— ๋„์›€์ด ๋  ๊ฒƒ์ด๋ผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์ด๋ฏธ์ง€๋“ค์„ weight sharing๋œ lower layer์— ๋„ฃ๊ณ  ์ถ”์ถœ๋œ feature๋“ค์„ 3D pooling๊ณผ 3D filter(trainable)๋ฅผ ์ฃผ์ž…ํ•˜์—ฌ feature dimension์„ ์ค„์ธ ํ›„ higher layer์— ํ†ต๊ณผ์‹œ์ผœ ์žฌ์งˆ์„ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. 

multiview data๊ฐ€ ์‚ฌ์šฉ๋˜๊ธด ํ•˜์ง€๋งŒ ์‚ฌ์‹ค์ƒ view๋ณ„ correlation์„ ๋ณด๋Š” ๊ณณ์€ 3D filter ๋ฟ์ด๋ฏ€๋กœ multi-view์— ํšจ๊ณผ์ ์ธ ๋„คํŠธ์›Œํฌ๋Š” ์•„๋‹ˆ๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ญ๋‹ˆ๋‹ค. singleview์— ๋น„ํ•ด ์•ฝ 2% ๊ฐ€๋Ÿ‰ ์„ฑ๋Šฅ์ด ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

My conclusion

์žฌ์งˆ ๋ถ„๋ฅ˜์‹œ ์ž‘์€ ๊ฐ๋„ ์ฐจ์ด๋กœ ๊ด€์ธกํ•  ์ˆ˜ ์žˆ๋Š” ํ‘œ๋ฉด์˜ partialํ•œ reflectance ์ •๋ณด๋กœ ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ์‹คํ—˜์ ์œผ๋กœ ์•Œ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ color, diff ์ด๋ฏธ์ง€, multiview ์ด๋ฏธ์ง€๋“ค์„ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋„คํŠธ์›Œํฌ๋ฅผ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋Š” ๋…ผ๋ฌธ์ด์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฐ˜์‘ํ˜•