๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ“– Theory/3D vision & Graphics

[CV] SFM (Structure From Motion) : ์—ฐ์†๋œ 2D ์ด๋ฏธ์ง€๋“ค๋กœ ์นด๋ฉ”๋ผ ํฌ์ฆˆ์™€ 3D shape ์žฌ๊ตฌ์„ฑํ•˜๊ธฐ

by ๋ญ…์ฆค 2022. 6. 5.
๋ฐ˜์‘ํ˜•

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” visual localization์— ํ•„์ˆ˜์ ์ธ 2D ์˜์ƒ์œผ๋กœ 3D ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” SFM (Structure From Motion) ์— ๋Œ€ํ•ด ์ตœ๋Œ€ํ•œ ์ˆ˜์‹์—†์ด ๊ฐ ๋‹จ๊ณ„์˜ ๋ชฉ์ ๊ณผ ์˜๋ฏธ์— ์ง‘์ค‘ํ•˜์—ฌ ์†Œ๊ฐœํ•˜๋ ค ํ•œ๋‹ค

 

SFM์€ GUI๊ฐ€ ์žˆ๋Š” ๋ฒ”์šฉ SFM (Structure From Motion), MVS (Multi View Stereo) ํŒŒ์ดํ”„๋ผ์ธ์ธ COLMAP ๋“ฑ์˜ visual localization task ์—์„œ ์‚ฌ์šฉ๋œ๋‹ค. COLMAP์˜ ๊ฒฝ์šฐ ์ •๋ ฌ๋œ ๋˜๋Š” ์ •๋ ฌ๋˜์ง€ ์•Š์€ ์ด๋ฏธ์ง€์˜ reconstruction ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค.(Multi-view ์ด๋ฏธ์ง€๋“ค๋งŒ ๋„ฃ์œผ๋ฉด camera pose + 3D shape์„ ๋ณต์›ํ•ด์ค๋‹ˆ๋‹ค.)

 

* SFM๊ณผ ์œ ์‚ฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜?

SFM๊ณผ ์œ ์‚ฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋งŽ์ด ์•Œ๋ ค์ ธ ์žˆ๋Š” SLAM, Visual Odometry ๋“ฑ์ด ์žˆ๋‹ค. ์ฐจ์ด์ ์œผ๋กœ๋Š” Visual Odometry์— loop closure์ด ์ถ”๊ฐ€๋˜๋ฉด SLAM์ด๊ณ , SLAM์—์„œ real-time์œผ๋กœ ๋™์ž‘ํ•˜์ง€ ์•Š์•„๋„ ๋˜๋Š” ๊ฒฝ์šฐ์— SFM ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. SLAM์€ loop closure ๊ธฐ๋Šฅ์ด ์žˆ์œผ๋ฉฐ real-time์œผ๋กœ ๋™์ž‘ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๊ฐ€๋ฒผ์šด ํŠน์ง•์ด ์žˆ๊ณ , SFM์€ real-time์œผ๋กœ ๋™์ž‘ํ•˜์ง€ ์•Š์•„๋„ ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋Œ€์ ์œผ๋กœ ๋” ์˜ค์ฐจ๊ฐ€ ์ ๊ณ  ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค.
์ฆ‰ ๋‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ output์€ ๋™์ผํ•˜์ง€๋งŒ ๋กœ๋ด‡, ์ž์œจ์ฃผํ–‰์ฐจ, AR/VR ๊ธฐ๊ธฐ ๋“ฑ์—์„œ ์‚ฌ์šฉ์ž์˜ ์œ„์น˜๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํŒŒ์•…ํ•  ๋•Œ๋Š” SLAM์ด ์ ์ ˆํ•˜๊ณ , 3D ๊ณต๊ฐ„์„ ๋ณต์›ํ•  ๋•Œ๋Š” SFM์ด ์ ์ ˆํ•˜๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

* COLMAP์˜ SFM ์€ 2016๋…„ CVPR์— "Structure-from-Motion Revisited" ๋…ผ๋ฌธ์— ์„ค๋ช…๋˜์–ด ์žˆ์Œ

 

SFM (Structure From Motion)

Structure From Motion

SFM (Structure From Motion)์€ ๋™์ผํ•œ ๊ฐ์ฒด๋ฅผ ๋‹ค๋ฅธ ์‹œ์ ์—์„œ ์ค‘์ฒฉ๋˜๋„๋ก ์ฐ์€ multi-view ์ด๋ฏธ์ง€๋“ค๋กœ๋ถ€ํ„ฐ 3D structure์™€ camera pose๋ฅผ ๋ณต์›ํ•˜๋Š” ํ”„๋กœ์„ธ์Šค๋กœ ํ”ํžˆ ์•„๋ž˜ ์„ธ๊ฐ€์ง€ ๋‹จ๊ณ„๋ฅผ ๊ฐ€์ง„๋‹ค.

  1. Feature detection and extraction
  2. Feature matching and geometric verification
  3. Structure and motion reconstruction

๊ทธ๋ฆฌ๊ณ  COLMAP์—์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ์ฃผ์˜์‚ฌํ•ญ์ด ์žˆ์–ด์„œ ๋„ค๊ฐ€์ง€ ์กฐ๊ฑด์ด ์ž˜ ์ถฉ์กฑ๋˜์ง€ ์•Š์œผ๋ฉด 3D reconstruction ๊ฒฐ๊ณผ๊ฐ€ ์ข‹์ง€ ์•Š๋‹ค. ํŠนํžˆ texture ์ •๋ณด๊ฐ€ ๋งŽ์ด ์—†๋Š”(ํŒจํ„ด์ด ๊ฑฐ์˜ ์—†๋Š”) ๊ฐ์ฒด๋Š” ํ˜•์ƒ์ด ๋งŽ์ด ๋ง๊ฐ€์ง„์ฑ„๋กœ ๋ณต์›๋œ๋‹ค.

  • Texture ๊ฐ€ ์ข‹์€ ์ด๋ฏธ์ง€ ์‚ฌ์šฉ
  • ์œ ์‚ฌํ•œ ์กฐ๋ช… ์กฐ๊ฑด์˜ ์ด๋ฏธ์ง€ ์‚ฌ์šฉ
  • ์‹œ๊ฐ์ ์œผ๋กœ ๋งŽ์ด ์ค‘์ฒฉ๋œ ์ด๋ฏธ์ง€ ์‚ฌ์šฉ
  • ๋‹ค์–‘ํ•œ viewpoints์—์„œ ๊ด€์ธกํ•œ ์ด๋ฏธ์ง€ ์‚ฌ์šฉ

 

Incremental Structure-from-Motion pipline

COLMAP์—์„œ๋Š” Incremental Structure-from-Motion ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜๊ณ , ์ด๋Š” itreativeํ•œ reconstruction ๊ตฌ์„ฑ ์š”์†Œ๊ฐ€ ์žˆ๋Š” sequential ํ”„๋กœ์„ธ์Šค ํŒŒ์ดํ”„๋ผ์ธ์ด๋‹ค.

 

์š”์•ฝํ•˜๋ฉด input images์—์„œ ๊ณ ์œ ํ•œ feature๋ฅผ ๋ฝ‘๊ณ (Feature Extraction) ์ด๋ฏธ์ง€ ๊ฐ„์˜ ์œ ์‚ฌํ•œ feature ์œ„์น˜๋ฅผ matching ์‹œํ‚ค๊ณ  Epipolar geometry ๊ด€์ ์—์„œ ์ด๋ฏธ์ง€ ๊ฐ„์˜ feature์˜ ๋Œ€์‘ ๊ด€๊ณ„๋ฅผ ๋ณธ๋‹ค. ์ดํ›„ Incremental Reconstruction ๋‹จ๊ณ„์—์„œ ์‚ผ๊ฐ ์ธก๋Ÿ‰๋ฒ•์„ ํ†ตํ•ด 3D ์ขŒํ‘œ๋ฅผ ์ถ”์ •ํ•˜๊ณ (Triangulation), Bundle Adjustment์™€ Outlier Filtering์œผ๋กœ ์—๋Ÿฌ๋ฅผ ์ตœ์†Œํ™”ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๊ฐ€ ๋“ค์–ด์˜ฌ ๋•Œ๋งˆ๋‹ค ๋ฐ˜๋ณตํ•ด์„œ camera pose์™€ 3D ์ขŒํ‘œ๋ฅผ ์ตœ์ข…์ ์œผ๋กœ ์ถ”์ •ํ•œ๋‹ค.

 

1. Correspondence Search

์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” Scene overlap ์ฆ‰, ์ด๋ฏธ์ง€ ๊ฐ„์˜ ๋™์ผํ•œ ์œ„์น˜๋ฅผ ์ฐพ๋Š” correspondence search์ด๋‹ค. ์ด ๋‹จ๊ณ„์˜ output์€ ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ๊ฒ€์ฆ๋œ image pair ๋“ค์˜ ์ง‘ํ•ฉ๊ณผ ๊ฐ point ์— ๋Œ€ํ•œ image projection graph์ด๋‹ค.

 

1.1. Feature Extraction

Feature๋Š” SFM์ด ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€์—์„œ ๊ณ ์œ ํ•œ ๋ชจ์–‘์„ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋„๋ก radiometric๊ณผ geometric ๋ณ€ํ™”์— ๋ถˆ๋ณ€ํ•ด์•ผ ํ•œ๋‹ค. ๋•Œ๋ฌธ์— SIFT ๊ฐ™์€ feature descriptor ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ feature ๋ฅผ ์ถ”์ถœํ•œ๋‹ค.

 

1.2. Matching

Feature Matching

๊ฐ ์ด๋ฏธ์ง€์—์„œ ์ถ”์ถœํ•œ feature๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ฏธ์ง€ ์‚ฌ์ด์—์„œ ๋™์ผํ•œ point๋ฅผ ๋ณด๋Š” ๋ถ€๋ถ„์„ ์ฐพ์•„์„œ ์„œ๋กœ matching. ์ด ๋‹จ๊ณ„๊นŒ์ง€๋Š” ๊ธฐํ•˜ํ•™์ ์ธ ์˜๋ฏธ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์ง€๋Š” ์•Š๋‹ค.

 

1.3. Geometric Verification

Epipolar Geometry

Feature matching์„ ํ†ตํ•ด ๋งค์นญํ•œ ํฌ์ธํŠธ๋“ค์€ ๋ชจ์–‘๋งŒ ๋ณด๊ณ  ๋งค์นญํ•œ ๊ฒฐ๊ณผ์ด๊ธฐ ๋•Œ๋ฌธ์— ์‹ค์ œ๋กœ ๋™์ผํ•œ ํฌ์ธํŠธ ๋ผ๋ฆฌ ๋งค์นญ๋˜์—ˆ๋‹ค๋Š” ๋ณด์žฅ์€ ์—†์—†๋‹ค. ๊ต‰์žฅํžˆ ์œ ์‚ฌํ•˜๊ฒŒ ๋ณด์ด๋Š” ๋ถ€๋ถ„์ด๋ฉด ์‹ค์ œ๋กœ ๋‹ค๋ฅธ ์œ„์น˜์ด์ง€๋งŒ ๋งค์นญ์ด ๋  ์ˆ˜๋„ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

๋”ฐ๋ผ์„œ SFM์€ Epipolar geometry ๊ด€์ ์—์„œ ์ด๋ฏธ์ง€ ๊ฐ„์˜ feature points ๋ฅผ mappingํ•˜๋Š” transformation์„ ์ถ”์ •ํ•˜์—ฌ ๊ธฐํ•˜ํ•™์ ์ธ ์ผ์น˜ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•œ๋‹ค. Epipolar geometry์—์„œ essential matrix E (calibrated)  ๋˜๋Š” fumdamental matrix F (uncalibrated) ๋ฅผ ํ†ตํ•ด ์›€์ง์ด๋Š” ์นด๋ฉ”๋ผ์— ๋Œ€ํ•œ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์œ ํšจํ•œ transformation์ด ์ด๋ฏธ์ง€ ์‚ฌ์ด์— ์ถฉ๋ถ„ํ•œ ์ˆ˜์˜ feature๋ฅผ mappingํ•˜๋ฉด ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ๊ฒ€์ฆ๋œ ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผํ•œ๋‹ค. ์ฆ‰, ์ด๋ฏธ์ง€ ๊ฐ„์˜ matching ๋œ point ๋“ค๋กœ ๋ถ€ํ„ฐ ์ ์ ˆํ•œ Essential/Fundamental matrix ๋ฅผ ์ฐพ๋Š”๋‹ค๋Š” ๋ง์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  Essential matrix๋ฅผ SVD(Single Value Decomposition)์„ ์‚ฌ์šฉํ•˜์—ฌ linear least squares๋ฅผ ํ’€๋ฉด camera pose(Rotation, Translation) ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

Feature Matching after RANSAC

Matching ์œผ๋กœ๋ถ€ํ„ฐ์˜ correspondence๋Š” ์ข…์ข… outlier๋กœ ์˜ค์—ผ๋˜๊ธฐ ๋•Œ๋ฌธ์— RANSAC ๋“ฑ์˜ robust estimation ๊ธฐ์ˆ ์ด ํ•„์š”ํ•˜๋‹ค. RANSAC ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ชจ๋“  ๊ฒฝ์šฐ์˜ ์ˆ˜ ์ค‘์—์„œ ์ตœ๋Œ€ inlier๋ฅผ ๊ฐ–๋Š” Essential(๋˜๋Š” Fundamental) matrix๋ฅผ ์„ ํƒํ•œ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ output์€ ๊ธฐํ•˜ํ•™์  ๊ฒ€์ฆ์ด ์™„๋ฃŒ๋œ image pair set๊ณผ ์ด๋“ค ์‚ฌ์ด์˜ inlier correspondence์™€ ์ด๋“ค์˜ geometric relation ์ด๋‹ค.

 

 

2. Incremental Reconstruction

Reconstruction ๋‹จ๊ณ„์—์„œ ์ž…๋ ฅ์€ scene graph ์ด๊ณ , ์ถœ๋ ฅ์€ register๋œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ pose estimates ์™€ reconstruct๋œ scene structure ์ด๋‹ค.

 

2.1. Initialization

SFM์€ ์‹ ์ค‘ํ•˜๊ฒŒ ์„ ํƒ๋œ 2-view reconstruction์œผ๋กœ ๋ชจ๋ธ์„ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค. Reconstruction์˜ ์„ฑ๋Šฅ์€ incremental ํ”„๋กœ์„ธ์Šค์˜ seed ์œ„์น˜์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์— ์ดˆ๊ธฐํ™” ์ž‘์—…์ด ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์นด๋ฉ”๋ผ ์ค‘์ฒฉ์ด ๋งŽ์€ ๊ณณ์—์„œ image graph ์˜ denseํ•œ ์œ„์น˜์—์„œ ์ดˆ๊ธฐํ™”ํ•˜๋ฉด ์ค‘๋ณต์„ฑ ์ฆ๊ฐ€๋กœ ์„ฑ๋Šฅ์ด ์ข‹์•„์ง„๋‹ค. 

 

2.2. Image Registration

Register ๋œ ์ด๋ฏธ์ง€(2D-3D correspondences)์—์„œ triangulate๋œ points์— ๋Œ€ํ•œ feature correspondences๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Perspective-n-Point(PnP) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ ์ƒˆ ์ด๋ฏธ์ง€๋ฅผ ํ˜„์žฌ ๋ชจ๋ธ์— registerํ•  ์ˆ˜ ์žˆ๋‹ค. PnP ๋ฌธ์ œ์—๋Š” pose์™€ calibrate ๋˜์ง€ ์•Š์€ ์นด๋ฉ”๋ผ์˜ intrinsic ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •์ด ํฌํ•จ๋œ๋‹ค.

 

*Perspective-n-Point (PnP) : Camera intrinsic parameter, 3D points์™€ ์ด์— ๋Œ€์‘ํ•˜๋Š” ์ด๋ฏธ์ง€์ƒ์˜ 2D projection points๊ฐ€ ์ฃผ์–ด์งˆ ๋•Œ camera pose๋ฅผ ๊ตฌํ•˜๋Š” ๋ฌธ์ œ

 

2.3. Triangulation

 

์˜ค์ฐจ๋กœ ์ธํ•ด ๊ฐ ์ด๋ฏธ์ง€ ์ƒ์˜ ๋งค์นญ๋œ ๋‘ point์—์„œ ray๋ฅผ ์˜๋ฉด 3์ฐจ์› ์ƒ์— ์ •ํ™•ํ•˜๊ฒŒ ์ผ์น˜ํ•˜์ง€ ์•Š๋Š”๋‹ค. ์ด ๋•Œ ์‚ผ๊ฐ ์ธก๋Ÿ‰์„ ํ†ตํ•ด 3์ฐจ์› ์ƒ์—์„œ x1๊ณผ x2์˜ ์ค‘์ ์ธ x๋ฅผ ๊ตฌํ•œ๋‹ค.

์‚ผ๊ฐ ์ธก๋Ÿ‰์€ ์ค‘๋ณต์„ฑ์„ ํ†ตํ•ด ๊ธฐ์กด ๋ชจ๋ธ์˜ ์•ˆ์ •์„ฑ์„ ๋†’์ด๊ณ  ์ถ”๊ฐ€ 2D-3D correspondece๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์ƒˆ ์ด๋ฏธ์ง€๋ฅผ ๋“ฑ๋กํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— SFM์—์„œ ์ค‘์š”ํ•œ ๋‹จ๊ณ„์ด๋‹ค.

 

2.4. Bundle Adjustment

 

์ด์ „ ๋‹จ๊ณ„์—์„œ ๊ณ„์‚ฐํ•œ camera pose ์™€ 3D points๋ฅผ refine ํ•˜์—ฌ reprojection error ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ์ž‘์—…์ด๋‹ค. 3D point(landmark)์—์„œ ์นด๋ฉ”๋ผ ์ขŒํ‘œ๊ณ„์˜ ์›์ ๊นŒ์ง€์˜ ์ง์„ ์„ ray๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๊ณ , ์ด๋Ÿฌํ•œ ray๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ ์žˆ์„ ๋•Œ 'bundles of ray' ๋ผ๊ณ  ํ‘œํ˜„ํ•œ๋‹ค. Reprojectionํ•œ ์œ„์น˜์™€ ์ด๋ฏธ์ง€ ์ƒ์˜ ์œ„์น˜์˜ ์ฐจ์ด์ธ reprojection error ๋ฅผ cost function์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰ํ•˜๋Š”๋ฐ, image projection ๊ณผ์ •์€ non-linear ํ•˜๊ธฐ ๋•Œ๋ฌธ์— Gauss-Newton ๋“ฑ์˜ non-linear optimization ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.

๋ฐ˜์‘ํ˜•

'๐Ÿ“– Theory > 3D vision & Graphics' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[CV] 3D Geometry ์„ค๋ช…  (0) 2022.04.04
[Graphics] 3D model์˜ material ์†์„ฑ / obj, mtl ํŒŒ์ผ  (0) 2022.04.04