[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Fast Segment Anything | Fast SAM | SAM์˜ ๊ฒฝ๋Ÿ‰ํ™”

2023. 7. 2. 17:07ยท๐Ÿ› Research/Perception
๋ฐ˜์‘ํ˜•

SAM (Segment Anything Model)  ์„ค๋ช… ๋ฐ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

 

[Meta AI] SAM (Segment Anything Model) ์‚ฌ์šฉ ๋ฐฉ๋ฒ• | ๋ชจ๋“  ๊ฐ์ฒด๋ฅผ ๋ถ„ํ• ํ•˜๋Š” Vision AI ๋ชจ๋ธ

SAM (Segment Anything Model) Meta ์—์„œ SAM (Segment Anything Model) ์ด๋ผ๋Š” ์–ด๋–ค ๊ฒƒ์ด๋“  ๋ถ„ํ• ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๊ณต๊ฐœํ–ˆ๋‹ค. ๋…ผ๋ฌธ ์ œ๋ชฉ ์ž์ฒด๊ฐ€ 'Segment Anything' ์ธ๋ฐ ๊ต‰์žฅํžˆ ์ž์‹ ๊ฐ ๋„˜์น˜๋Š” ์›Œ๋”ฉ์ด๋‹ค. ๊ฐ„๋‹จํ•œ ์„ค๋ช…์„

mvje.tistory.com

 

Meta AI์˜ Segment Anything Model (SAM)์ด ๊ณต๊ฐœ๋œ์ง€ ์–ผ๋งˆ๋‚˜ ๋๋‹ค๊ณ  ๋ฒŒ์จ Fast SAM์ด๋ผ๋Š” ์†๋„๊ฐ€ ํ–ฅ์ƒ๋œ ๋ฒ„์ „์˜ SAM์ด ๊ณต๊ฐœ๋˜์—ˆ๋‹ค.

 

๋น…ํ…Œํฌ ๊ธฐ์—…์—์„œ ํ˜์‹ ์ ์ธ AI ๋ชจ๋ธ์„ ์ง€์†์ ์œผ๋กœ ๊ณต๊ฐœํ•˜๊ณ , ์˜คํ”ˆ์†Œ์Šค ํ˜‘ํšŒ๋‚˜ ๋Œ€ํ•™๊ต, ๊ธฐ์—… ๋“ฑ์—์„œ ๋น…ํ…Œํฌ ๊ธฐ์—…์˜ AI ๋ชจ๋ธ์„ ์‘์šฉํ•œ ๋‹ค์–‘ํ•œ AI ๋ชจ๋ธ๊ณผ ๊ธฐ์ˆ ๋“ค์ด ๋น ๋ฅด๊ฒŒ ์Ÿ์•„์ ธ ๋‚˜์˜ค๊ณ  ์žˆ๋‹ค. 

 

 

SAM์€ iamge segmentation, caption, editing๊ณผ ๊ฐ™์€ ๊ณ ๊ธ‰ ์ž‘์—…์˜ ๊ธฐ์ดˆ ๋‹จ๊ณ„๊ฐ€ ๋˜๊ณ  ์žˆ์ง€๋งŒ, ๋ง‰๋Œ€ํ•œ ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ๊ณ„์‚ฐ์€ ์ฃผ๋กœ ๊ณ ํ•ด์ƒ๋„ ์ž…๋ ฅ์˜ transformer ์•„ํ‚คํ…์ฒ˜์—์„œ ๋‚˜์˜จ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” SAM๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์œผ๋กœ ์†๋„ ํ–ฅ์ƒ ๋Œ€์•ˆ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด task๋ฅผ segment ์ƒ์„ฑ ๋ฐ ํ”„๋กฌํ”„ํŒ…์œผ๋กœ ์žฌ๊ตฌ์„ฑํ•˜๋ฉด instance segmentation branch๊ฐ€ ์žˆ๋Š” ์ผ๋ฐ˜ CNN detector๋กœ ์ด task๋ฅผ ์ž˜ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” FastSAM์„ ์‚ฌ์šฉํ•˜๋ฉด SAM๋ณด๋‹ค 50๋ฐฐ ๋น ๋ฅด์ง€๋งŒ SAM๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.


 

 

FastSAM์€ AIS(All-instance segmentation)์™€ PGS(Prompt-Guided Selection)์œผ๋กœ ๋‚˜๋‰œ๋‹ค. ์•ž ๋‹จ๊ณ„๋Š” basis์ด๊ณ  ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” task ์ค‘์‹ฌ์˜ ํ›„์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋ผ๊ณ  ํ•œ๋‹ค.

 

์ œ์•ˆํ•˜๋Š” FastSAM์€ YOLACT ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜๋Š” instance segmentation branch๊ฐ€ ์žˆ๋Š” YOLOv8-seg๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ๊ฐ์ฒด ๋˜๋Š” ์˜์—ญ์„ ๋ถ„ํ• ํ•œ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋‹ค์–‘ํ•œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ด€์‹ฌ์žˆ๋Š” ํŠน์ • ๊ฐ์ฒด๋ฅผ ์‹๋ณ„ํ•œ๋‹ค. ์ฃผ๋กœ point, box, text ํ”„๋กœํ”„ํŠธ๋ฅผ ํ™œ์šฉํ•˜๊ณ , text์˜ ๊ฒฝ์šฐ CLIP๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค.

 

๋˜ํ•œ SAM์—์„œ ์‚ฌ์šฉํ•œ SA-1B ๋ฐ์ดํ„ฐ์…‹์˜ 2%๋งŒ์œผ๋กœ CNN detector๋ฅผ ํ•™์Šตํ•˜์—ฌ SAM๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๊ณ  ๊ณ„์‚ฐ๋Ÿ‰์€ ํฌ๊ฒŒ ๊ฐ์†Œํ•œ๋‹ค.

 

๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๊ณ , ์ฃผ๋œ ์ œ์•ˆ์ ์€ Segment Anything ์ž‘์—…์„ ์œ„ํ•ด transformer๊ฐ€ ์•„๋‹Œ ๊ฒฝ๋Ÿ‰ํ™”๋œ CNN ๊ธฐ๋ฐ˜์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์€ ์œ ์ง€ํ•˜๊ณ  ๊ณ„์‚ฐ๋Ÿ‰์€ ํฌ๊ฒŒ ์ค„์˜€๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•œ ๋น„์ „ ์ž‘์—…์—์„œ ๊ฒฝ๋Ÿ‰ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•œ๋‹ค. 

 

segmentation ๋ชจ๋ธ ์„ค๊ณ„ ์‹œ ๊ฒฝ๋Ÿ‰ํ™” ๋ฐฉ๋ฒ•์„ ์ฐธ๊ณ ํ•˜๊ธฐ์— ์ข‹์€ ๋…ผ๋ฌธ์ด๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ ๋‹ค. 

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์กฐ๊ธˆ ๋” ์ž์„ธํžˆ ์‚ดํŽด๋ณด๋ฉด...

YOLOv8 ์•„ํ‚คํ…์ฒ˜๋Š” YOLOv5์—์„œ ๋ฐœ์ „๋˜์–ด YOLOX, YOLOv6, YOLOv7 ์˜ ์ฃผ์š” ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ฉ๋œ ํ˜•ํƒœ์ด๋‹ค. YOLOv8์˜ ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ์™€ neck ๋ชจ๋“ˆ์€ YOLOv5์˜ C3 ๋ชจ๋“ˆ์„ C2f ๋ชจ๋“ˆ๋กœ ๋Œ€์ฒดํ•˜๊ณ , ์—…๋ฐ์ดํŠธ๋œ ํ—ค๋“œ ๋ชจ๋“ˆ์€ anchor-based์—์„œ anchor-free๋กœ ์ „ํ™˜๋œ ํ˜•ํƒœ๋ผ๊ณ  ํ•œ๋‹ค.

 

 

์‹คํ—˜ ๊ฒฐ๊ณผ

์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์‚ดํŽด๋ณด๋ฉด SAM์— ๋น„ํ•ด running speed๊ฐ€ ํ™•์‹คํžˆ ๋นจ๋ผ์กŒ์ง€๋งŒ ์„ฑ๋Šฅ์€ ๋น„์Šทํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋˜ํ•œ anomaly detection, salient object segmentation, building extracting ๋“ฑ ๋‹ค์–‘ํ•œ task์˜ ์‹œ๊ฐํ™”๋œ ๊ฒฐ๊ณผ๊ฐ€ ๊ณต์œ ๋˜์–ด ์žˆ๋‹ค.

๋ฐ˜์‘ํ˜•

'๐Ÿ› Research > Perception' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Object Detection] ๊ฐ์ฒด ๊ฒ€์ถœ ๋ชจ๋ธ (1) : RCNN, SPPNet  (0) 2024.08.11
[Object Detection] DETR ๋ชจ๋ธ ์ดํ•ดํ•˜๊ธฐ! | End-to-end ๊ฐ์ฒด ๊ฒ€์ถœ ๋ชจ๋ธ  (0) 2024.08.10
[๋…ผ๋ฌธ ์†Œ๊ฐœ] TAM (Track Anything Model) | ์–ด๋–ค ๊ฒƒ์ด๋“  ์ถ”์ ํ•˜๋Š” Vision AI ๋ชจ๋ธ | Sagment Anything ๋น„๋””์˜ค ๋ฒ„์ „  (0) 2023.04.30
[๋…ผ๋ฌธ ์†Œ๊ฐœ] DINOv2 - Self-supervised Vision Transformer | Meta AI | ๋ ˆ์ด๋ธ” ๋ฐ์ดํ„ฐ ์—†์ด ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” Vision AI ๋ชจ๋ธ  (0) 2023.04.29
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers  (0) 2022.08.09
'๐Ÿ› Research/Perception' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [Object Detection] ๊ฐ์ฒด ๊ฒ€์ถœ ๋ชจ๋ธ (1) : RCNN, SPPNet
  • [Object Detection] DETR ๋ชจ๋ธ ์ดํ•ดํ•˜๊ธฐ! | End-to-end ๊ฐ์ฒด ๊ฒ€์ถœ ๋ชจ๋ธ
  • [๋…ผ๋ฌธ ์†Œ๊ฐœ] TAM (Track Anything Model) | ์–ด๋–ค ๊ฒƒ์ด๋“  ์ถ”์ ํ•˜๋Š” Vision AI ๋ชจ๋ธ | Sagment Anything ๋น„๋””์˜ค ๋ฒ„์ „
  • [๋…ผ๋ฌธ ์†Œ๊ฐœ] DINOv2 - Self-supervised Vision Transformer | Meta AI | ๋ ˆ์ด๋ธ” ๋ฐ์ดํ„ฐ ์—†์ด ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” Vision AI ๋ชจ๋ธ
๋ญ…์ฆค
๋ญ…์ฆค
AI ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ
    ๋ฐ˜์‘ํ˜•
  • ๋ญ…์ฆค
    moovzi’s Doodle
    ๋ญ…์ฆค
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
  • ๊ณต์ง€์‚ฌํ•ญ

    • โœจ About Me
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (216)
      • ๐Ÿ“– Fundamentals (34)
        • Computer Vision (9)
        • 3D vision & Graphics (6)
        • AI & ML (16)
        • etc. (3)
      • ๐Ÿ› Research (78)
        • Deep Learning (7)
        • Perception (19)
        • OCR (7)
        • Multi-modal (8)
        • Image•Video Generation (18)
        • 3D Vision (4)
        • Material • Texture Recognit.. (8)
        • Large-scale Model (7)
        • etc. (0)
      • ๐Ÿ› ๏ธ Engineering (8)
        • Distributed Training & Infe.. (5)
        • AI & ML ์ธ์‚ฌ์ดํŠธ (3)
      • ๐Ÿ’ป Programming (92)
        • Python (18)
        • Computer Vision (12)
        • LLM (4)
        • AI & ML (18)
        • Database (3)
        • Distributed Computing (6)
        • Apache Airflow (6)
        • Docker & Kubernetes (14)
        • ์ฝ”๋”ฉ ํ…Œ์ŠคํŠธ (4)
        • etc. (7)
      • ๐Ÿ’ฌ ETC (4)
        • ์ฑ… ๋ฆฌ๋ทฐ (4)
  • ๋งํฌ

    • ๋ฆฌํ‹€๋ฆฌ ํ”„๋กœํ•„ (๋ฉ˜ํ† ๋ง, ๋ฉด์ ‘์ฑ…,...)
    • ใ€Ž๋‚˜๋Š” AI ์—”์ง€๋‹ˆ์–ด์ž…๋‹ˆ๋‹คใ€
    • Instagram
    • Brunch
    • Github
  • ์ธ๊ธฐ ๊ธ€

  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
๋ญ…์ฆค
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Fast Segment Anything | Fast SAM | SAM์˜ ๊ฒฝ๋Ÿ‰ํ™”
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”