๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ› Research/Deep Learning

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Class-Balanced Loss Based on Effective Number of Samples / Class imbalance๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•

by ๋ญ…์ฆค 2022. 5. 21.
๋ฐ˜์‘ํ˜•

Class Imabalance ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๋Š” CVPR 2019์— ๊ณต๊ฐœ๋œ ๋…ผ๋ฌธ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฒˆ ๋ฆฌ๋ทฐ๋Š” ๋ฌธ์ œ ์ •์˜์™€ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ฐœ๋…์ ์œผ๋กœ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. (๋””ํ…Œ์ผ ์ œ์™ธ)

 

Class Imbalance ?

Class Imbalance ๋ฌธ์ œ๋ผ๋Š” ๊ฒƒ์€ ๋”ฅ๋Ÿฌ๋‹์—์„œ ๋„คํŠธ์›Œํฌ๋ฅผ ํ•™์Šต์‹œํ‚ฌ ๋•Œ ์‚ฌ์šฉ๋˜๋Š” training data ์˜ class ๊ฐœ์ˆ˜๊ฐ€ balance ๊ฐ€ ๋งž์ง€ ์•Š๋Š” ์ƒํ™ฉ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ๋งค์šฐ ๋นˆ๋ฒˆํ•œ ์ผ์ด๊ธฐ์— ์ค‘์š”ํ•œ task ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™๊ณ„์—์„œ๋Š” Long tail data ๋ผ๋Š” ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์€ class ๋ถ€ํ„ฐ ์•„์ฃผ ์ ์€ class ๊นŒ์ง€ ๋‹ค์–‘ํ•˜๊ฒŒ ๋ถ„ํฌํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ class imabalance ๋ฌธ์ œ์—์„œ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค. 

 

์ผ๋ฐ˜์ ์ธ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ• ?

Class Imbalance๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์€ class ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๊ทธ class๋ฅผ ์ถฉ๋ถ„ํžˆ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ์‹œ decision boundary๊ฐ€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์€ class ์ชฝ์œผ๋กœ ๋ฐ€๋ฆฌ๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ „ํ†ต์ ์œผ๋กœ Re-sampling, Re-weighting ๋“ฑ์˜ ๋ฐฉ๋ฒ•์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. Re-sampling์€ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฐธ๋Ÿฐ์Šค๋ฅผ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์€ class์˜ ๋ฐ์ดํ„ฐ๋ฅผ augmentation ๋“ฑ์„ ํ†ตํ•ด ๋Š˜๋ฆฌ๊ฑฐ๋‚˜, ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์€ class์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜์ ์ธ ์ƒํ™ฉ์—์„œ ํฌ๊ฒŒ ์ข‹์€ ํšจ๊ณผ๋ฅผ ๊ฑฐ๋‘์ง€ ๋ชปํ•˜๊ณ , Re-weighting์€ class์— ๋”ฐ๋ผ weight update ๋ฅผ ํ•˜๋Š” ์ •๋„์˜ ์ฐจ์ด๋ฅผ ๋‘ฌ์„œ ๋ฐธ๋Ÿฐ์Šค ์žˆ๊ฒŒ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. Previous work ์—์„œ๋Š” class frequency ์ฆ‰, data๊ฐ€ ๋งŽ์€ class ๋Š” weight๋ฅผ ์ž‘๊ฒŒ ์—…๋ฐ์ดํŠธํ•˜๊ณ  data๊ฐ€ ๋งŽ์€ classs๋Š” weight๋ฅผ ํฌ๊ฒŒ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Loss function์—์„œ class frequency์˜ inverse๋ฅผ ๊ณฑํ•˜์—ฌ ๋ชจ๋ธ์„ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์€ class ์ชฝ์œผ๋กœ decision boundary ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์ด ๋ฐ€๋ ค ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ข‹์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. 

 

Proposed Methods

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ๊ฐœ์„ ํ•˜๊ณ ์ž Effective number ๋ผ๋Š” ๋ฐฉ์‹์„ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๋ฉด, ์ •๋ณด์˜ ์–‘๋„ ๋งŽ์•„์ง‘๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ„~์† ์Œ“์ด๊ฒŒ๋˜๋ฉด ์ค‘๋ณต๋˜๋Š”(overlap) ์ •๋ณด๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ์ด๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ํ•™์Šต ์‹œ ํฐ ์ด์ ์„ ์ฃผ์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ๊ฐ„๋‹จํžˆ ์„ค๋ช…ํ•˜๋ฉด ๊ฐ•์•„์ง€๋ผ๋Š” class๊ฐ€ ์ด๋ฏธ์ง€๊ฐ€ 1000์žฅ ์ฏค ๋˜๋‹ˆ ์ด๋ฏธ ๊ฑฐ์˜ ๋ชจ๋“  ๊ฐ•์•„์ง€๋ฅผ ํฌํ•จํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด 1000์žฅ์„ ๋„˜์–ด์„œ 3000์žฅ 5000์žฅ์„ ์‚ฌ์šฉํ•˜๋”๋ผ๋„ ์ •๋ณด๋Ÿ‰์ด ํฌ๊ฒŒ ๋” ๋งŽ์•„์ง€์ง€๋Š” ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด์ „ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์‚ฌ์‹ค์„ ๊ฐ„๊ณผํ•œ์ฑ„ class์˜ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๋งŒ์œผ๋กœ ํ•™์Šต ์ •๋„๋ฅผ ์กฐ์ ˆํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” data์˜ overlap์„ ์ˆ˜์น˜์ ์œผ๋กœ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•˜๊ณ , ์ด๋ฅผ Effective number ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ ์—ฐ๊ตฌ์—์„œ class frequency ์˜ inverse๋ฅผ loss function์— ๊ณฑํ–ˆ๋‹ค๋ฉด, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ effective number์˜ ์—ญ์ˆ˜๋ฅผ loss function์— ๊ณฑํ•˜์—ฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์˜๋ฏธ์žˆ๋Š”(effective) ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์„์ˆ˜๋ก ์ ๊ฒŒ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ฐฉ์‹์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ... ๊ตฌ๊ตฌ์ ˆ์ ˆ ๋ณธ ๋…ผ๋ฌธ์—์„œ data overlap์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ณ  ์ˆ˜์‹์ ์œผ๋กœ ์ฆ๋ช…์€ ํ•˜์ง€๋งŒ, ์‹ค์ œ๋กœ class ๋ณ„ data์˜ overlap์„ ์ธก์ •ํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค (?). ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ธ ๋ฒ ํƒ€ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ class์˜ data๊ฐ€ ์ผ์ •์ˆ˜์ค€์„ ๋„˜์–ด์„œ๋ฉด ๋” ์ด์ƒ effectiveํ•œ data๊ฐ€ ๋Š˜์ง€์•Š๋Š” ์ฆ‰, saturation๋˜๋„๋ก ์„ค์ •ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์œ„ Figure 3๋ฅผ ๋ณด๋ฉด ๋ฒ ํƒ€ ๊ฐ’์— ๋”ฐ๋ฅธ class-balanced term ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฒ ํƒ€=0 ์ผ ๋•Œ๋Š” class balanced term ์ด ์ ์šฉ์ด ์•ˆ๋˜๋Š” ๊ฒฝ์šฐ์ด๊ณ , ๋ฒ ํƒ€=1์ธ ๊ฒฝ์šฐ์—๋Š” previous work์ฒ˜๋Ÿผ class frequency๋กœ ๋‚˜๋ˆ ์ค€ ์ƒํƒœ์ž…๋‹ˆ๋‹ค. ๊ทธ ์‚ฌ์ด์˜ ๋ฒ ํƒ€ ๊ฐ’์€ class์˜ ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๋‹ค๊ฐ€ ์–ด๋Š ์ˆœ๊ฐ„ saturation ๋˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด ๋ถ€๋ถ„์ด ์กฐ๊ธˆ ์˜์•„ํ–ˆ๋˜๊ฒŒ, ์‚ฌ์‹ค class ์˜ ์œ ํ˜•์— ๋”ฐ๋ผ overlap ๋˜๋Š” ์ˆ˜๊ฐ€ ๋‹ค๋ฅผ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ๊ต‰์žฅํžˆ ๋‹จ์ˆœํ•œ class๋Š” ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ์กฐ๊ธˆ๋งŒ ์žˆ์–ด๋„ class๋ฅผ ๋ชจ๋‘ ์ปค๋ฒ„ํ•ด์„œ ๊ธˆ๋ฐฉ overlap์ด ์ƒ๊ธธ ๊ฒƒ์ด๊ณ , deformation์ด ๋งŽ์€ ๋ณต์žกํ•œ class์˜ ๊ฒฝ์šฐ์—๋Š” ๊ฝค ๋งŽ์€ ์ƒ˜ํ”Œ์ด ์žˆ์–ด๋„ overlap์ด ์•ˆ๋‚  ์ˆ˜๋„ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” class ์œ ํ˜•๋ณ„ overlap์— ๋Œ€ํ•œ ์ฐจ์ด๋ฅผ ๋‘์ง€๋Š” ์•Š๊ณ  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ธ ๋ฒ ํƒ€ ๊ฐ’์„ ๊ฐ’์„ ์กฐ์ •ํ•˜๋ฉฐ ํŠน์ • ์ƒ˜ํ”Œ ๊ฐœ์ˆ˜ ์ด์ƒ์—์„œ๋Š” data overlap์ด ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‘๊ณ  ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๊ต‰์žฅํžˆ ๊ฐ„๋‹จํ•˜๊ณ  ์ด์ „ ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ์„ฑ๋Šฅ๋„ ์ข‹์•„์ง€๊ธด ํ•ฉ๋‹ˆ๋‹ค. 

 

์•„์ง ์ด ๋ถ„์•ผ์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์ง€ ๋ชปํ•˜๊ณ  ๋น„๊ต์  ์˜ค๋ž˜๋œ ๋…ผ๋ฌธ์ด๋ผ ํ˜„์žฌ๋Š” ๋” ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•๋“ค์ด ๋งŽ์œผ๋ฆฌ๋ผ ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๊ธฐํšŒ๊ฐ€ ๋˜๋ฉด ๊ฐ€์žฅ ์ตœ๊ทผ์˜ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ณด๊ณ  ๋ฆฌ๋ทฐํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ฐ˜์‘ํ˜•