[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis / RGB-D ์˜์ƒ์—์„œ์˜ segementation

2022. 1. 12. 00:53ยท๐Ÿ› Research/Detection & Segmentation
๋ฐ˜์‘ํ˜•

๋ณธ ๋…ผ๋ฌธ์€ 2021๋…„ International Conference on Robotics and Automation (ICRA) ๋ผ๋Š” ํ•™ํšŒ์— ๊ฒŒ์žฌ๋˜์—ˆ๊ณ , RGB+depth image ๋กœ semantic segmentation task๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ์—ฐ๊ตฌ๋ฅผ ์†Œ๊ฐœํ•˜๊ธฐ ์œ„ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

 

Depth ์ด๋ฏธ์ง€๋Š” ๊ด€์ธก์ž(์นด๋ฉ”๋ผ) ์™€์˜ ๊ฑฐ๋ฆฌ๋ฅผ ํ‘œํ˜„ํ•˜๋ฏ€๋กœ RGB ์ด๋ฏธ์ง€์—์„œ๋Š” ๊ฐ์ฒด๊ฐ€ ๋ถ„๋ฆฌ๋˜๋Š” ์ง€์ ์ฒ˜๋Ÿผ ๋ณด์ผ์ง€๋ผ๋„(์กฐ๋ช…, ๊ทธ๋ฆผ์ž์— ๋”ฐ๋ผ) depth ์ด๋ฏธ์ง€์—์„œ๋Š” ๋™์ผํ•œ(continuousํ•œ) ๊ฐ์ฒด๋กœ ๋ณด์ผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— RGB ์ด๋ฏธ์ง€์™€ depth ์ด๋ฏธ์ง€๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋ฉด segmentation ์„ฑ๋Šฅ์ด ์˜ฌ๋ผ๊ฐˆ ๊ฒƒ์ด๋ผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

(๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” depth ์ด๋ฏธ์ง€๊ฐ€ rgb ์ด๋ฏธ์ง€์— complementary geometric information์„ ์ œ๊ณตํ•œ๋‹ค๊ณ  ํ‘œํ˜„)

 

๊ฐ€์žฅ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด rgb-encoder, depth-encoder๋กœ rgb, depth์˜ feature๋ฅผ ์ถ”์ถœํ•˜๊ณ  decoder๋กœ feature๋ฅผ ๋„˜๊ฒจ์ฃผ๊ธฐ ์ „์— feature๋ฅผ mergingํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. 

์•„๋ž˜ figure๋ฅผ ๋ณด๋ฉด  rgb, depth ์ด๋ฏธ์ง€๋ฅผ ๊ฐ๊ฐ ๋‹ค๋ฅธ encoder์— ์ฃผ์ž…ํ•˜๊ณ  depth-encoder์—์„œ ์ถ”์ถœ๋˜๋Š” feature๋“ค์„ layer ์ค‘๊ฐ„์ค‘๊ฐ„์—์„œ rgb-encoder ์ชฝ์œผ๋กœ ๋„˜๊ฒจ์ค˜์„œ RGB-D Fusion ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

 

- RGB-D Fusion

RGB์™€ depth ์ด๋ฏธ์ง€๋ฅผ ๊ฐ๊ฐ SE-block ์„ ์‚ฌ์šฉํ•˜์—ฌ channel-wise attention์„ ์ˆ˜ํ–‰ํ•˜๊ณ  element-wise ๋”ํ•ด์„œ ์ค๋‹ˆ๋‹ค. ์ด๋Š” RGB์™€ depth ์ด๋ฏธ์ง€๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๋„คํŠธ์›Œํฌ์—์„œ ์ธ์ฝ”๋”ฉ๋˜์—ˆ์œผ๋‹ˆ feature๋ฅผ ํ•ฉ์น˜๊ธฐ์ „์— channel calibration์„ ํ•ด์ค˜์„œ, RGB์™€ depth ์ด๋ฏธ์ง€ ์ •๋ณด๊ฐ€ ๋ฐธ๋Ÿฐ์Šค ์žˆ๊ฒŒ ํ•ฉ์ณ์งˆ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

- Context Module

PSPNet ์˜ Pyramid Pooling Module๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ์—ฌ๋Ÿฌ branch ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„œ๋กœ ๋‹ค๋ฅธ scale์˜ feature๋“ค์„ aggregateํ•ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ๊ณ„์‚ฐ๋Ÿ‰ ๊ฐ์†Œ๋ฅผ ์œ„ํ•ด resnet์˜ basic block์„ spatially factorized version(NBt1D)์œผ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” mobilenet ์ฒ˜๋Ÿผ ๋ชจ๋ธ์„ ๊ฒฝ๋Ÿ‰ํ™” ์‹œํ‚ค๊ธฐ ์œ„ํ•ด 3x3 conv ๋ฅผ 3x1 conv์™€ 1x3 conv๋กœ ๋ถ„ํ•ด์‹œ์ผœ์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ERFNet์—์„œ ์ฒ˜์Œ ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. 

ESANet

 

- Experimental Results

 

๋‚ด ์ƒ๊ฐ

์—ฌ๋Ÿฌ method๋ฅผ ์ ์ ˆํžˆ ํ†ตํ•ฉํ•˜์—ฌ RGB, Depth ์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋‘ ์ธ์ฝ”๋”ฉํ•˜์—ฌ semantic segmentation์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์„ค๊ณ„๋œ ๋„คํŠธ์›Œํฌ์ด์ง€๋งŒ, ์•ฝ๊ฐ„์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด encoder๊ฐ€ 2๊ฐœ๊ฐ€ ์ƒ๊ธฐ๋Š” ๋‹จ์ ์ด ์—ฌ์ „ํžˆ ์กด์žฌํ•˜๋Š” ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค.

 

๋˜ํ•œ feature๋ฅผ fusion ํ•˜๋Š” ๋ชจ๋“ˆ์ด ๋‹จ์ˆœํžˆ SE block์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ ์ด์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์ด RGB ์™€ depth ์ด๋ฏธ์ง€๋ฅผ ๋ฐธ๋Ÿฐ์Šค ์žˆ๊ฒŒ ์ ์ ˆํžˆ ํ•ฉ์ณ์ฃผ๋Š”์ง€ ์˜๋ฌธ์ž…๋‹ˆ๋‹ค.

(๋„คํŠธ์›Œํฌ์— ๋งก๊ฒจ๋ฒ„๋ฆฌ๋Š” ๋А๋‚Œ์ด๋ผ, ablation study์—์„œ SE block์„ ์‚ฌ์šฉํ•ด์„œ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์ง€๋งŒ, SE block์€ attention module ์ด๋ผ ์–ด๋””์— ๋ถ™์—ฌ๋„ ์•ฝ๊ฐ„์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ์žˆ์œผ๋ฏ€๋กœ..)

๋ฐ˜์‘ํ˜•

'๐Ÿ› Research > Detection & Segmentation' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction  (0) 2022.01.19
[๊ฐ„๋‹จ ์„ค๋ช…] Semi-Supervised Semantic Segmentation / Segmentation์—์„œ unlabeled ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•  (0) 2022.01.13
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Feature Pyramid Networks for Object Detection / FPN / ๊ฐ์ฒด์˜ ์Šค์ผ€์ผ์— invariantํ•œ ๋„คํŠธ์›Œํฌ  (0) 2022.01.13
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Pyramid Scene Parsing Network / PSPNet / Pyramid Pooling  (0) 2021.12.05
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Unified Perceptual Parsing for Scene Understanding / UperNet / Multi-task learning  (0) 2021.12.04
'๐Ÿ› Research/Detection & Segmentation' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [๊ฐ„๋‹จ ์„ค๋ช…] Semi-Supervised Semantic Segmentation / Segmentation์—์„œ unlabeled ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•
  • [๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Feature Pyramid Networks for Object Detection / FPN / ๊ฐ์ฒด์˜ ์Šค์ผ€์ผ์— invariantํ•œ ๋„คํŠธ์›Œํฌ
  • [๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Pyramid Scene Parsing Network / PSPNet / Pyramid Pooling
  • [๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Unified Perceptual Parsing for Scene Understanding / UperNet / Multi-task learning
๋ญ…์ฆค
๋ญ…์ฆค
AI ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ
    ๋ฐ˜์‘ํ˜•
  • ๋ญ…์ฆค
    CV DOODLE
    ๋ญ…์ฆค
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
  • ๊ณต์ง€์‚ฌํ•ญ

    • โœจ About Me
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (198)
      • ๐Ÿ“– Fundamentals (33)
        • Computer Vision (9)
        • 3D vision & Graphics (6)
        • AI & ML (15)
        • NLP (2)
        • etc. (1)
      • ๐Ÿ› Research (64)
        • Deep Learning (7)
        • Image Classification (2)
        • Detection & Segmentation (17)
        • OCR (7)
        • Multi-modal (4)
        • Generative AI (6)
        • 3D Vision (2)
        • Material & Texture Recognit.. (8)
        • NLP & LLM (11)
        • etc. (0)
      • ๐ŸŒŸ AI & ML Tech (7)
        • AI & ML ์ธ์‚ฌ์ดํŠธ (7)
      • ๐Ÿ’ป Programming (85)
        • Python (18)
        • Computer Vision (12)
        • LLM (4)
        • AI & ML (17)
        • Database (3)
        • Apache Airflow (6)
        • Docker & Kubernetes (14)
        • ์ฝ”๋”ฉ ํ…Œ์ŠคํŠธ (4)
        • C++ (1)
        • etc. (6)
      • ๐Ÿ’ฌ ETC (3)
        • ์ฑ… ๋ฆฌ๋ทฐ (3)
  • ๋งํฌ

  • ์ธ๊ธฐ ๊ธ€

  • ํƒœ๊ทธ

    ChatGPT
    nlp
    GPT
    Python
    Computer Vision
    OCR
    ํ”„๋กฌํ”„ํŠธ์—”์ง€๋‹ˆ์–ด๋ง
    Image Classification
    LLM
    VLP
    deep learning
    material recognition
    pandas
    segmentation
    multi-modal
    ๋„์ปค
    object detection
    OpenAI
    Text recognition
    airflow
    pytorch
    AI
    3D Vision
    ํŒŒ์ด์ฌ
    ๊ฐ์ฒด๊ฒ€์ถœ
    ๊ฐ์ฒด ๊ฒ€์ถœ
    CNN
    ๋”ฅ๋Ÿฌ๋‹
    ์ปดํ“จํ„ฐ๋น„์ „
    OpenCV
  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
๋ญ…์ฆค
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis / RGB-D ์˜์ƒ์—์„œ์˜ segementation
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”