๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ› Research/Deep Learning

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] END-TO-END OPTIMIZED IMAGE COMPRESSION | ๋”ฅ๋Ÿฌ๋‹ ๋ฐฉ์‹์˜ ์˜์ƒ ์••์ถ•

by ๋ญ…์ฆค 2022. 5. 14.
๋ฐ˜์‘ํ˜•

ICLR 2017 ์— ๋ฐœํ‘œ๋œ ๋…ผ๋ฌธ์œผ๋กœ ์ œ๋ชฉ ๊ทธ๋Œ€๋กœ end-to-end ๋ฐฉ์‹์œผ๋กœ ์ด๋ฏธ์ง€ ์••์ถ• ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๋Š” ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค.  

์ด ๋ถ„์•ผ์— ๋Œ€ํ•œ ์ง€์‹์ด ๊ทธ๋ ‡๊ฒŒ ๋งŽ์ง€ ์•Š์•„์„œ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๊ฐ€ ํ—ˆ์ˆ (?)ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค... ใ…Ž 

 

 

- ๊ธฐ๋ณธ์ ์ธ ์˜์ƒ ์••์ถ• ์„ค๋ช… : https://mvje.tistory.com/86?category=1033082 

 

์˜์ƒ ์••์ถ• - JPEG, MPEG

์˜์ƒ ์••์ถ• ๊ด€๋ จ ๋‚ด์šฉ์„ ๋‹ค์‹œ ๊ณต๋ถ€ํ•  ๊ธฐํšŒ๊ฐ€ ์ƒ๊ฒจ์„œ ๊นŒ๋จน๊ธฐ ์ „์— ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค! ๋ฏธ๋””์–ด ๋ฐ์ดํ„ฐ๋Š” ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ๋งŽ์•„์ง€๊ณ  ์ด๋ฅผ ์ €์žฅํ•  ๊ณต๊ฐ„์€ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํ•œ๊ณ„๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์˜์ƒ ์••์ถ•์€ ์ค‘์š”ํ•œ

mvje.tistory.com

Abstract

Nonlinear analysis transformation, uniform quantizer, nonlinear synthesis transformation ์œผ๋กœ ๊ตฌ์„ฑ๋œ ์ด๋ฏธ์ง€ ์••์ถ• ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๋ณ€ํ™˜์€ convolutional linear filter์™€ nonlinear activation function์„ ํ•œ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑํ•˜์—ฌ ์—ฐ์†๋œ 3๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. Stochastic gradient descent ์˜ ๋ณ€ํ˜•์„ ํ†ตํ•ด ํ•™์Šต ์ด๋ฏธ์ง€์—์„œ rate-distortion ์„ฑ๋Šฅ์„ ์œ„ํ•ด ์ „์ฒด ๋ชจ๋ธ์„ jointly ํ•˜๊ฒŒ optimize ํ•˜์—ฌ quantizer์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ถˆ์—ฐ์† loss function์— ๋Œ€ํ•œ continuous proxy๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค. ํŠน์ • ์กฐ๊ฑด์—์„œ ์™„ํ™”๋œ loss function์€ VAE(Variational AutoEncoder)์— ์˜ํ•ด ๊ตฌํ˜„๋œ generative model์˜ log likelihood๋กœ ํ•ด์„๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ ์••์ถ• ๋ชจ๋ธ์€ trade-off ํŒŒ๋ผ๋ฏธํ„ฐ์— ์˜ํ•ด ์ง€์ •๋œ ๋Œ€๋กœ rate-distortion ๊ณก์„ ์„ ๋”ฐ๋ผ ์ฃผ์–ด์ง„ ์ง€์ ์—์„œ ์ž‘๋™ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€์—์„œ ์ตœ์ ํ™”๋œ ๋ฐฉ๋ฒ•์ด ํ‘œ์ค€ JPEG ๋ฐ JPEG2000 ์••์ถ• ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ๋‚˜์€ rate-distortion ์„ฑ๋Šฅ์ด ๋‚˜์˜ต๋‹ˆ๋‹ค. ๋˜ํ•œ MS-SSIM์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ๊ด€์ ์ธ ํ’ˆ์งˆ ํ‰๊ฐ€๋ฅผ ํ†ตํ•ด ์ง€์›๋˜๋Š” ๋ชจ๋“  bit rate์—์„œ ๋ชจ๋“  ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ์‹œ๊ฐ์  ํ’ˆ์งˆ์„ ๊ทน์ ์œผ๋กœ ๊ฐœ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค.

 

Proposed Method

๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ์ด๋ฏธ์ง€ ์••์ถ• ๊ตฌ์กฐ๋Š” ๊ธฐ๋ณธ์ ์ธ JPEG ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜(Transform → Quantization → Entropy Coding → Decoding → Inverse Transform)์˜ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. 

์ด๋Ÿฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋”ฅ๋Ÿฌ๋‹์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด์—๋Š” DCT๋ฅผ ์“ฐ๋˜ Transform ๊ณผ์ •์„ Neural Network๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค. Quantization ์—ฐ์‚ฐ์€ continuous ํ•œ ์‹ ํ˜ธ๋ฅผ discreteํ•˜๊ฒŒ ๋ฐ”๊ฟ”์ฃผ๋Š” ์—ฐ์‚ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฏธ๋ถ„์ด ๋ถˆ๊ฐ€๋Šฅํ•ด์„œ backpropagation์ด ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” quantization์„ approximation ํ•˜์—ฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Inverse Transform ๊ณผ์ •์€ Neural Network์ธ Transform ๊ณผ์ •์˜ ์™„์ „ํ•œ ์—ญํ•จ์ˆ˜๋Š” ์•„๋‹ˆ์ง€๋งŒ, ์ตœ์ข… loss๋ฅผ mse๋กœ ๊ณ„์‚ฐํ•˜์—ฌ ๋„คํŠธ์›Œํฌ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—ญํ•จ์ˆ˜์˜ ํ˜•ํƒœ๋ฅผ ๋„๋„๋ก ํ•™์Šต๋  ๊ฒƒ ์ž…๋‹ˆ๋‹ค. 

 

 

 

- x : image

- ga : Analysis transform (Encoding transform)

- gs : Synthesis transform (Decoding transform)

 

Architecture

์ œ์•ˆํ•œ ๊ตฌ์กฐ๋Š” ํฌ๊ฒŒ Analysis(Encoding), Synthesis(Decoding)์œผ๋กœ ๋‚˜๋‰˜๊ณ , ๊ฐ ๊ตฌ์กฐ๋Š” (Convolutional Filter + Pooling + activation function)*3์œผ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ํŠน์ดํ•œ ์ ์€ activation function์€ Generalized Divisive Normalization(GDN) ์ด๋ผ๋Š” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Batch norm ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ๋ฒ ํƒ€์™€ ๊ฐ๋งˆ๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ์ด๊ณ , ์ด๋Ÿฌํ•œ task์—์„œ ReLU ๋Œ€์‹  GDN์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ํšจ๊ณผ์ ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Generalized Divisive Normalization(GDN)

(Conv+pooling+GDN) ์„ ์ด 3๋ฒˆ ๋ฐ˜๋ณตํ•˜๋Š”๋ฐ ์ด๋Š” downsampling์ด 4*2*2 ๋งŒํผ ์ฆ‰, 16x16 ํŒจ์น˜ ํ•˜๋‚˜๋ฅผ ํ•˜๋‚˜์˜ ๊ฐ’์œผ๋กœ ์••์ถ•ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” 8x8 ์„ ํ•˜๋‚˜์˜ block์œผ๋กœ ์ทจ๊ธ‰ํ•˜๋Š” JPEG ๋ณด๋‹ค ๋” ํฐ ์••์ถ•๋ฅ ์„ ๊ฐ€์งˆ ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

 

Loss Function

Total Loss

 

์œ„ ๊ทธ๋ฆผ์—์„œ R์€ Quantization ์ดํ›„์— ์ƒ์„ฑ๋œ ๋น„ํŠธ์ŠคํŠธ๋ฆผ์˜ ๊ธธ์ด๋ฅผ ์ตœ์†Œํ™” ์‹œํ‚ค๊ธฐ ์œ„ํ•œ loss์ด๊ณ  D๋Š” ๋ณต์›๋œ ์ด๋ฏธ์ง€์˜ ํ€„๋ฆฌํ‹ฐ๊ฐ€ ์›๋ž˜ ์ด๋ฏธ์ง€์˜ ํ€„๋ฆฌํ‹ฐ์™€ ์œ ์‚ฌํ•˜๋„๋ก ํ•˜๋Š” MSE loss์ž…๋‹ˆ๋‹ค. ๋น„ํŠธ์ŠคํŠธ๋ฆผ์˜ ๊ธธ์ด๋ฅผ ์ตœ์†Œํ™”์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ์—”ํŠธ๋กœํ”ผ๋ฅผ ์ตœ์†Œํ™”ํ•ด์•ผํ•˜๋ฏ€๋กœ ์œ„์™€ ๊ฐ™์€ loss ์‹์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

 

Quantization Approximation

 

 

Quantization ๊ณผ์ •์€ approximation ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ˜์˜ฌ๋ฆผ ์—ฐ์‚ฐ์ด ์•„๋‹Œ, y ๊ฐ’์— ํŠน์ •ํ•œ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ฏธ๋ถ„๊ฐ€๋Šฅํ•˜๋„๋ก ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์›๋ž˜ ๋ฐ˜์˜ฌ๋ฆผ ์—ฐ์‚ฐ์ด ํŠน์ • ์ •์ˆ˜๊ฐ’์— -1/2 ~ + 1/2 ์‚ฌ์ด์— ์žˆ๋Š” ๊ฐ’์„ ํŠน์ • ์ •์ˆ˜๊ฐ’์œผ๋กœ ๋งคํ•‘์‹œํ‚ค๋Š” ์—ฐ์‚ฐ์ด๊ธฐ ๋•Œ๋ฌธ์—, y ๊ฐ’์— -1/2 ~ + 1/2 ์˜ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•ด์ฃผ๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ quantization ๋ฐฉ๋ฒ•(y hat)์˜ differential entropy์™€ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” approximation ๋ฐฉ๋ฒ•(y tilda)์˜ differential entropy ๋ฐฉ๋ฒ•์ด ์ˆ˜์‹์ ์œผ๋กœ ๋™์ผํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์ฆ๋ช…ํ•ฉ๋‹ˆ๋‹ค. 

 

Experimental Results

Rate-distortion curve๋Š” ์œ„์ชฝ์œผ๋กœ ๊ฐˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹์€ ๊ฒƒ์ด๊ณ  ์ œ์•ˆ๋œ ๋ฐฉ์‹์ด ๋ชจ๋“  bit rate์—์„œ JPEG๊ณผ JPEG2000์„ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

 

 

 

์‹ค์ œ ๋ณต์› ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋ฉด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์˜ ์ด๋ฏธ์ง€๊ฐ€ ๋…ธ์ด์ฆˆ๊ฐ€ ๋œํ•˜๊ณ  ์กฐ๊ธˆ ๋” ํ”ฝ์…€๋“ค์ด ์—ฐ์†์„ฑ์žˆ๊ฒŒ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

 

 

์œ„ figure๋Š” 1ํ–‰์€ JPEG, 2ํ–‰์€ Proposed method, 3ํ–‰์€ JPEG200์ธ๋ฐ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ๊ฐˆ์ˆ˜๋ก ํ€„๋ฆฌํ‹ฐ๋ฅผ ์ค„์ด๊ณ  ์••์ถ•๋ฅ ์„ ๋†’์ธ ์‹คํ—˜๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ์••์ถ•๋ฅ ์„ ๋†’์ž„์— ๋”ฐ๋ผ ํ€„๋ฆฌํ‹ฐ๊ฐ€ ์•„์ฃผ ๋งŽ์ด ๋–จ์–ด์ง€๋Š” JPEG๊ณผ JPEG2000์— ๋น„ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ํ€„๋ฆฌํ‹ฐ๊ฐ€ ํฌ๊ฒŒ ์ €ํ•ด๋˜์ง€ ์•Š๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

 

Discussion

์กฐ๊ธˆ ์˜๊ตฌ์‹ฌ์ด ๋“œ๋Š” ๋ถ€๋ถ„์€ ์•„๋ฌด๋ž˜๋„ ๋”ฅ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ• ์ž์ฒด๊ฐ€ ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์— ์˜ํ–ฅ์ด ์ƒ๋‹นํžˆ ํฌ๊ณ , ๋…๋ฆฝ๋œ Training data์™€ Test data๋กœ ์‹คํ—˜ํ•˜๋”๋ผ๋„ ๋‘ data์˜ distribution์ด ํฌ์ง€ ์•Š์€ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๋•Œ๋ฌธ์— ์™„์ „ํžˆ ๋‹ค๋ฅธ ๋ถ„ํฌ์˜ ์ด๋ฏธ์ง€๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ๋„ ์••์ถ• ์„ฑ๋Šฅ์ด ์ข‹์€์ง€? ๊ทธ๋ฆฌ๊ณ  ์ •๋Ÿ‰์ (PSNR, MS-SSIM)์œผ๋กœ ํ€„๋ฆฌํ‹ฐ๊ฐ€ ์ข‹์€ ์ด๋ฏธ์ง€๊ฐ€ ์‹ค์ œ๋กœ ์‹œ๊ฐ์ ์œผ๋กœ๋„ ๋ฌด์กฐ๊ฑด ์ข‹์€ ์ด๋ฏธ์ง€ ์ธ์ง€?(๋…ผ๋ฌธ์—์„œ ๊ธฐ์žฌํ•œ ๊ฒฐ๊ณผ๋ฌผ๋“ค์ด ์ฒด๋ฆฌํ”ผํ‚น์ธ์ง€ ์•„๋‹Œ์ง€) ๊ฐ€ ์กฐ๊ธˆ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค. ๋ฌผ๋ก  2017๋…„ ๋…ผ๋ฌธ์ด๋ผ ๋” ์ข‹์€ ๋ฐฉ๋ฒ•๋“ค์ด ๋งŽ์ด ๋‚˜์™”์„ ๊ฒƒ ๊ฐ™์€๋ฐ, ์•„์ง ์ฝ์–ด๋ณด์ง€ ๋ชปํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ... ์ถ”ํ›„์— ๋” ์ฝ์–ด๋ณผ ๊ธฐํšŒ๊ฐ€ ์ƒ๊ธฐ๋ฉด ๊ธฐ๋กํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

๋ฐ˜์‘ํ˜•