VAE (Variational Autoencoder) ์„ค๋ช… | VAE Pytorch ์ฝ”๋“œ ์˜ˆ์‹œ
ยท
๐Ÿ› Research/Generative AI
VAE (Variational Autoencoder)   VAE(Variational Autoencoder)๋Š” ์ƒ์„ฑ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋กœ, ์ฃผ๋กœ ์ฐจ์› ์ถ•์†Œ ๋ฐ ์ƒ์„ฑ ์ž‘์—…์— ์‚ฌ์šฉ๋˜๋Š” ์‹ ๊ฒฝ๋ง ์•„ํ‚คํ…์ฒ˜์ด๋‹ค. VAE๋Š” ๋ฐ์ดํ„ฐ์˜ ์ž ์žฌ ๋ณ€์ˆ˜๋ฅผ ํ•™์Šตํ•˜๊ณ  ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š”๋ฐ, ํŠนํžˆ ์ด๋ฏธ์ง€ ๋ฐ ์Œ์„ฑ ์ƒ์„ฑ๊ณผ ๊ฐ™์€ ์‘์šฉ ๋ถ„์•ผ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ VAE๋Š” ํฌ๊ฒŒ ์ธ์ฝ”๋”์™€ ๋””์ฝ”๋”๋ผ๋Š” ๋‘ ๋ถ€๋ถ„์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. Autoencoder(์˜คํ† ์ธ์ฝ”๋”)์™€ ํ—ท๊ฐˆ๋ฆด ์ˆ˜ ์žˆ๋Š”๋ฐ, ์˜คํ† ์ธ์ฝ”๋”๋Š” ์ธํ’‹์„ ๋˜‘๊ฐ™์ด ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋Š” latent variable z๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ๋ชฉ์ , ์ฆ‰ ์ธ์ฝ”๋”๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์ฃผ ๋ชฉ์ ์ด๊ณ ,VAE์˜ ๊ฒฝ์šฐ ์ธํ’‹ x๋ฅผ ์ž˜ ํ‘œํ˜„ํ•˜๋Š” latent vector๋ฅผ ์ถ”์ถœํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์ธํ’‹..
[NLP] BERT ๊ฐ„๋‹จ ์„ค๋ช… | Bi-Directional LM | ์–‘๋ฐฉํ–ฅ ์–ธ์–ด ๋ชจ๋ธ
ยท
๐Ÿ› Research/NLP & LLM
BERT(Bidirectional Encoder Representations from Transformers) BERT๋Š” ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ (NLP) ๋ถ„์•ผ์—์„œ ํ˜์‹ ์ ์ธ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ตฌ๊ธ€์ด ๊ฐœ๋ฐœํ•ด 2018๋…„์— ๊ณต๊ฐœ๋˜์—ˆ๋‹ค. BERT๋Š” ์ด์ „์˜ NLP ๋ชจ๋ธ๋ณด๋‹ค ๋” ํƒ์›”ํ•œ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์ž‘์—…์—์„œ ์ƒ์œ„ ์„ฑ๊ณผ๋ฅผ ์ด๋ฃจ์–ด ๋ƒˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ํŠนํžˆ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ์–ธ์–ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค๋ฅธ NLP ์ž‘์—…์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค๋ชฉ์  ๋ชจ๋ธ๋กœ ์ฃผ๋ชฉ๋ฐ›์•˜๋‹ค. ๋…ผ๋ฌธ ์ œ๋ชฉ์€ ์•„๋ž˜์™€ ๊ฐ™์œผ๋ฉฐ ํ”ผ์ธ์šฉ์ˆ˜๋Š” ์•ฝ 8๋งŒํšŒ(23๋…„ 9์›” ๊ธฐ์ค€)๋กœ ์ด์ œ๋Š” LM ๋ถ„์•ผ์—์„œ ์ •๋ง ๊ธฐ๋ณธ์ด ๋˜๋Š” ์—ฐ๊ตฌ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. paper : BERT: Pre-training of Deep Bidirectional Transformers for Languag..
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] NeRF ๊ฐ„๋‹จ ์„ค๋ช… & ์›๋ฆฌ ์ดํ•ดํ•˜๊ธฐ | ์ƒˆ๋กœ์šด ๋ฐฉํ–ฅ์—์„œ ๋ฐ”๋ผ๋ณธ view๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ธฐ์ˆ 
ยท
๐Ÿ› Research/3D Vision
- paper : NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis / ECCV2020 NeRF ๋…ผ๋ฌธ์ด ๊ณต๊ฐœ๋œ์ง€๋„ ์‹œ๊ฐ„์ด ๊ฝค ํ˜๋ €๋Š”๋ฐ, 2020 ECCV์—์„œ ๊ณต๊ฐœ๋์„ ๋•Œ๋งŒ ํ•ด๋„ ๊ต‰์žฅํžˆ ์‹ ๊ธฐํ•˜๊ณ  ํš๊ธฐ์ ์ธ view synthesis ๋ฐฉ๋ฒ•์œผ๋กœ ๊ด€์‹ฌ์„ ๋ฐ›์•˜์ง€๋งŒ, ์—ฌ๋Ÿฌ ๋‹จ์  ๋•Œ๋ฌธ์— ์‹ค์ œ ์„œ๋น„์Šค์— ์ ์šฉ๋˜๊ธฐ๋Š” ์‰ฝ์ง€ ์•Š์•˜๋‹ค. ํ•˜์ง€๋งŒ, 2023 CVPR์—์„œ๋Š” 2022๋…„์— ๋น„ํ•ด radiance๋ผ๋Š” ๋‹จ์–ด์˜ ์‚ฌ์šฉ์ด 80% ์ฆ๊ฐ€ํ•˜๊ณ , NeRF์˜ ๊ฒฝ์šฐ 39% ์ฆ๊ฐ€ํ–ˆ์„ ๋งŒํผ NeRF๋Š” ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ํŠนํžˆ ์ด์   ๊ฐœ๋… ์ฆ๋ช…์„ ๋„˜์–ด veiw editing ์ด๋‚˜ ๊ฐ์ข… application ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ์ฆ‰ NeRF๊ฐ€ ์ด์ œ ๊ฐ์ข… ์„œ๋น„์Šค์— ํ™œ์šฉ๋ ๋งŒ..
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Fast Segment Anything | Fast SAM | SAM์˜ ๊ฒฝ๋Ÿ‰ํ™”
ยท
๐Ÿ› Research/Detection & Segmentation
SAM (Segment Anything Model)  ์„ค๋ช… ๋ฐ ์‚ฌ์šฉ ๋ฐฉ๋ฒ• [Meta AI] SAM (Segment Anything Model) ์‚ฌ์šฉ ๋ฐฉ๋ฒ• | ๋ชจ๋“  ๊ฐ์ฒด๋ฅผ ๋ถ„ํ• ํ•˜๋Š” Vision AI ๋ชจ๋ธSAM (Segment Anything Model) Meta ์—์„œ SAM (Segment Anything Model) ์ด๋ผ๋Š” ์–ด๋–ค ๊ฒƒ์ด๋“  ๋ถ„ํ• ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๊ณต๊ฐœํ–ˆ๋‹ค. ๋…ผ๋ฌธ ์ œ๋ชฉ ์ž์ฒด๊ฐ€ 'Segment Anything' ์ธ๋ฐ ๊ต‰์žฅํžˆ ์ž์‹ ๊ฐ ๋„˜์น˜๋Š” ์›Œ๋”ฉ์ด๋‹ค. ๊ฐ„๋‹จํ•œ ์„ค๋ช…์„mvje.tistory.com Meta AI์˜ Segment Anything Model (SAM)์ด ๊ณต๊ฐœ๋œ์ง€ ์–ผ๋งˆ๋‚˜ ๋๋‹ค๊ณ  ๋ฒŒ์จ Fast SAM์ด๋ผ๋Š” ์†๋„๊ฐ€ ํ–ฅ์ƒ๋œ ๋ฒ„์ „์˜ SAM์ด ๊ณต๊ฐœ๋˜์—ˆ๋‹ค. ๋น…ํ…Œํฌ ๊ธฐ์—…์—์„œ ํ˜์‹ ์ ์ธ AI ๋ชจ๋ธ์„ ์ง€์†์ ์œผ..
[๋…ผ๋ฌธ ์†Œ๊ฐœ] TAM (Track Anything Model) | ์–ด๋–ค ๊ฒƒ์ด๋“  ์ถ”์ ํ•˜๋Š” Vision AI ๋ชจ๋ธ | Sagment Anything ๋น„๋””์˜ค ๋ฒ„์ „
ยท
๐Ÿ› Research/Detection & Segmentation
Track Anything: Segment Anything Meets Videos ์„ธ์ƒ ์ฐธ ๋น ๋ฅด๋‹ค. Meta AI์˜ SAM (Segment Anything Model)์ด ๋‚˜์˜จ์ง€ ์–ผ๋งˆ๋‚˜ ๋๋‹ค๊ณ  SAM์„ ๋น„๋””์˜ค์— ์ ์šฉํ•ด tracking task๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” TAM (Tracking Anything Model) ๋…ผ๋ฌธ์ด ๋‚˜์™”๋‹ค๊ณ  ํ•œ๋‹ค. Track-Anything์€ ๋น„๋””์˜ค ๊ฐ์ฒด ์ถ”์  ๋ฐ ๋ถ„ํ• ์„ ์œ„ํ•œ ์œ ์—ฐํ•œ ๋Œ€ํ™”ํ˜• ๋„๊ตฌ๋กœ Segment Anything์—์„œ ๊ฐœ๋ฐœ๋˜์—ˆ์œผ๋ฉฐ ์‚ฌ์šฉ์ž ํด๋ฆญ์„ ํ†ตํ•ด์„œ๋งŒ ์ถ”์  ๋ฐ ์„ธ๊ทธ๋จผํŠธํ™”ํ•  ํ•ญ๋ชฉ์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ถ”์ ํ•˜๋Š” ๋™์•ˆ ์‚ฌ์šฉ์ž๋Š” ์ถ”์ ํ•˜๋ ค๋Š” ๊ฐœ์ฒด๋ฅผ ์œ ์—ฐํ•˜๊ฒŒ ๋ณ€๊ฒฝํ•˜๊ฑฐ๋‚˜ ๋ชจํ˜ธํ•œ ๋ถ€๋ถ„์ด ์žˆ๋Š” ๊ฒฝ์šฐ ๊ด€์‹ฌ ์˜์—ญ์„ ์ˆ˜์ •ํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์„ฑ์„ ํ†ตํ•ด Track-Anything์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ž‘์—…..
[๋…ผ๋ฌธ ์†Œ๊ฐœ] DINOv2 - Self-supervised Vision Transformer | Meta AI | ๋ ˆ์ด๋ธ” ๋ฐ์ดํ„ฐ ์—†์ด ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” Vision AI ๋ชจ๋ธ
ยท
๐Ÿ› Research/Detection & Segmentation
DINOv2 ๋…ผ๋ฌธ ์ œ๋ชฉ : DINOv2: Learning Robust Visual Features without Supervision GitHub Demo 23๋…„ 4์›” Meta AI์—์„œ self-supervised learning์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํ„ฐ๋น„์ „ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์ธ DINOv2๋ฅผ ๊ณต๊ฐœํ–ˆ๋‹ค. LLM(Large Language Model) ํ•™์Šต์—๋„ ํ™œ์šฉ๋˜๋Š” self-supervised learning ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ ํ•™์Šต ์‹œ ๋งŽ์€ ์–‘์˜ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— AI ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๊ฐ•๋ ฅํ•˜๊ณ  ์œ ์—ฐํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. ๋…ผ๋ฌธ์— ๋”ฐ๋ฅด๋ฉด ์ตœ๊ทผ ๋ช‡๋…„ ๋™์•ˆ ์ปดํ“จํ„ฐ๋น„์ „ ์ž‘์—…์˜ ํ‘œ์ค€ ์ ‘๊ทผ ๋ฐฉ์‹์ด์—ˆ๋˜ ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ๋ฅผ ํŽ˜์–ด๋กœ ํ•™์Šตํ•˜๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐฉ์‹์˜ ํ•™์Šต ๋ฐฉ๋ฒ•์—์„œ๋Š” ์ด๋ฏธ์ง€์˜ ์บก์…˜ ์ •๋ณด์— ์˜์กดํ•œ..