๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ› Research/Detection & Segmentation

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Fast Segment Anything | Fast SAM | SAM์˜ ๊ฒฝ๋Ÿ‰ํ™”

by ๋ญ…์ฆค 2023. 7. 2.
๋ฐ˜์‘ํ˜•

SAM (Segment Anything Model)  ์„ค๋ช… ๋ฐ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

 

[Meta AI] SAM (Segment Anything Model) ์‚ฌ์šฉ ๋ฐฉ๋ฒ• | ๋ชจ๋“  ๊ฐ์ฒด๋ฅผ ๋ถ„ํ• ํ•˜๋Š” Vision AI ๋ชจ๋ธ

SAM (Segment Anything Model) Meta ์—์„œ SAM (Segment Anything Model) ์ด๋ผ๋Š” ์–ด๋–ค ๊ฒƒ์ด๋“  ๋ถ„ํ• ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๊ณต๊ฐœํ–ˆ๋‹ค. ๋…ผ๋ฌธ ์ œ๋ชฉ ์ž์ฒด๊ฐ€ 'Segment Anything' ์ธ๋ฐ ๊ต‰์žฅํžˆ ์ž์‹ ๊ฐ ๋„˜์น˜๋Š” ์›Œ๋”ฉ์ด๋‹ค. ๊ฐ„๋‹จํ•œ ์„ค๋ช…์„

mvje.tistory.com

 

Meta AI์˜ Segment Anything Model (SAM)์ด ๊ณต๊ฐœ๋œ์ง€ ์–ผ๋งˆ๋‚˜ ๋๋‹ค๊ณ  ๋ฒŒ์จ Fast SAM์ด๋ผ๋Š” ์†๋„๊ฐ€ ํ–ฅ์ƒ๋œ ๋ฒ„์ „์˜ SAM์ด ๊ณต๊ฐœ๋˜์—ˆ๋‹ค.

 

๋น…ํ…Œํฌ ๊ธฐ์—…์—์„œ ํ˜์‹ ์ ์ธ AI ๋ชจ๋ธ์„ ์ง€์†์ ์œผ๋กœ ๊ณต๊ฐœํ•˜๊ณ , ์˜คํ”ˆ์†Œ์Šค ํ˜‘ํšŒ๋‚˜ ๋Œ€ํ•™๊ต, ๊ธฐ์—… ๋“ฑ์—์„œ ๋น…ํ…Œํฌ ๊ธฐ์—…์˜ AI ๋ชจ๋ธ์„ ์‘์šฉํ•œ ๋‹ค์–‘ํ•œ AI ๋ชจ๋ธ๊ณผ ๊ธฐ์ˆ ๋“ค์ด ๋น ๋ฅด๊ฒŒ ์Ÿ์•„์ ธ ๋‚˜์˜ค๊ณ  ์žˆ๋‹ค. 

 

 

SAM์€ iamge segmentation, caption, editing๊ณผ ๊ฐ™์€ ๊ณ ๊ธ‰ ์ž‘์—…์˜ ๊ธฐ์ดˆ ๋‹จ๊ณ„๊ฐ€ ๋˜๊ณ  ์žˆ์ง€๋งŒ, ๋ง‰๋Œ€ํ•œ ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ๊ณ„์‚ฐ์€ ์ฃผ๋กœ ๊ณ ํ•ด์ƒ๋„ ์ž…๋ ฅ์˜ transformer ์•„ํ‚คํ…์ฒ˜์—์„œ ๋‚˜์˜จ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” SAM๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์œผ๋กœ ์†๋„ ํ–ฅ์ƒ ๋Œ€์•ˆ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด task๋ฅผ segment ์ƒ์„ฑ ๋ฐ ํ”„๋กฌํ”„ํŒ…์œผ๋กœ ์žฌ๊ตฌ์„ฑํ•˜๋ฉด instance segmentation branch๊ฐ€ ์žˆ๋Š” ์ผ๋ฐ˜ CNN detector๋กœ ์ด task๋ฅผ ์ž˜ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” FastSAM์„ ์‚ฌ์šฉํ•˜๋ฉด SAM๋ณด๋‹ค 50๋ฐฐ ๋น ๋ฅด์ง€๋งŒ SAM๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.


 

 

FastSAM์€ AIS(All-instance segmentation)์™€ PGS(Prompt-Guided Selection)์œผ๋กœ ๋‚˜๋‰œ๋‹ค. ์•ž ๋‹จ๊ณ„๋Š” basis์ด๊ณ  ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” task ์ค‘์‹ฌ์˜ ํ›„์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋ผ๊ณ  ํ•œ๋‹ค.

 

์ œ์•ˆํ•˜๋Š” FastSAM์€ YOLACT ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜๋Š” instance segmentation branch๊ฐ€ ์žˆ๋Š” YOLOv8-seg๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ฏธ์ง€์˜ ๋ชจ๋“  ๊ฐ์ฒด ๋˜๋Š” ์˜์—ญ์„ ๋ถ„ํ• ํ•œ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋‹ค์–‘ํ•œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ด€์‹ฌ์žˆ๋Š” ํŠน์ • ๊ฐ์ฒด๋ฅผ ์‹๋ณ„ํ•œ๋‹ค. ์ฃผ๋กœ point, box, text ํ”„๋กœํ”„ํŠธ๋ฅผ ํ™œ์šฉํ•˜๊ณ , text์˜ ๊ฒฝ์šฐ CLIP๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค.

 

๋˜ํ•œ SAM์—์„œ ์‚ฌ์šฉํ•œ SA-1B ๋ฐ์ดํ„ฐ์…‹์˜ 2%๋งŒ์œผ๋กœ CNN detector๋ฅผ ํ•™์Šตํ•˜์—ฌ SAM๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๊ณ  ๊ณ„์‚ฐ๋Ÿ‰์€ ํฌ๊ฒŒ ๊ฐ์†Œํ•œ๋‹ค.

 

๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ณด๋ฉด ์•Œ ์ˆ˜ ์žˆ๊ณ , ์ฃผ๋œ ์ œ์•ˆ์ ์€ Segment Anything ์ž‘์—…์„ ์œ„ํ•ด transformer๊ฐ€ ์•„๋‹Œ ๊ฒฝ๋Ÿ‰ํ™”๋œ CNN ๊ธฐ๋ฐ˜์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์€ ์œ ์ง€ํ•˜๊ณ  ๊ณ„์‚ฐ๋Ÿ‰์€ ํฌ๊ฒŒ ์ค„์˜€๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•œ ๋น„์ „ ์ž‘์—…์—์„œ ๊ฒฝ๋Ÿ‰ CNN ๋ชจ๋ธ์˜ ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•œ๋‹ค. 

 

segmentation ๋ชจ๋ธ ์„ค๊ณ„ ์‹œ ๊ฒฝ๋Ÿ‰ํ™” ๋ฐฉ๋ฒ•์„ ์ฐธ๊ณ ํ•˜๊ธฐ์— ์ข‹์€ ๋…ผ๋ฌธ์ด๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ ๋‹ค. 

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์กฐ๊ธˆ ๋” ์ž์„ธํžˆ ์‚ดํŽด๋ณด๋ฉด...

YOLOv8 ์•„ํ‚คํ…์ฒ˜๋Š” YOLOv5์—์„œ ๋ฐœ์ „๋˜์–ด YOLOX, YOLOv6, YOLOv7 ์˜ ์ฃผ์š” ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ฉ๋œ ํ˜•ํƒœ์ด๋‹ค. YOLOv8์˜ ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ์™€ neck ๋ชจ๋“ˆ์€ YOLOv5์˜ C3 ๋ชจ๋“ˆ์„ C2f ๋ชจ๋“ˆ๋กœ ๋Œ€์ฒดํ•˜๊ณ , ์—…๋ฐ์ดํŠธ๋œ ํ—ค๋“œ ๋ชจ๋“ˆ์€ anchor-based์—์„œ anchor-free๋กœ ์ „ํ™˜๋œ ํ˜•ํƒœ๋ผ๊ณ  ํ•œ๋‹ค.

 

 

์‹คํ—˜ ๊ฒฐ๊ณผ

์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์‚ดํŽด๋ณด๋ฉด SAM์— ๋น„ํ•ด running speed๊ฐ€ ํ™•์‹คํžˆ ๋นจ๋ผ์กŒ์ง€๋งŒ ์„ฑ๋Šฅ์€ ๋น„์Šทํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋˜ํ•œ anomaly detection, salient object segmentation, building extracting ๋“ฑ ๋‹ค์–‘ํ•œ task์˜ ์‹œ๊ฐํ™”๋œ ๊ฒฐ๊ณผ๊ฐ€ ๊ณต์œ ๋˜์–ด ์žˆ๋‹ค.

๋ฐ˜์‘ํ˜•