[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Character Region Awareness for Text Detection / CRAFT / ํ…์ŠคํŠธ ๊ฒ€์ถœ

2023. 3. 13. 21:06ยท๐Ÿ› Research/OCR
๋ฐ˜์‘ํ˜•

๋ณธ ๋…ผ๋ฌธ์€ Naver Clova์—์„œ CVPR 2019 ์— ๋ฐœํ‘œํ•œ Text Detection ๋…ผ๋ฌธ์œผ๋กœ, CRAFT ๋ผ๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. Text Detection ๋ถ„์•ผ์—์„œ ์›Œ๋‚™ ์œ ๋ช…๋‚œ ๋…ผ๋ฌธ์ด๊ณ  ๊ฐœ์ธ์ ์œผ๋กœ ํ…์ŠคํŠธ ๊ฒ€์ถœ์„ ์œ„ํ•ด ํ…์ŠคํŠธ์˜ ํŠน์„ฑ๊ณผ ๋”ฅ๋Ÿฌ๋‹์˜ ํ•™์Šต ํŠน์„ฑ์„ ์•„์ฃผ ํšจ์œจ์ ์œผ๋กœ ์ด์šฉํ•œ ๋งค๋ ฅ์ ์ธ ์—ฐ๊ตฌ๋ผ ์ƒ๊ฐํ•œ๋‹ค. ์ž์„ธํ•œ ์„ค๋ช…์€ ๋‹ค๋ฅธ ๋ธ”๋กœ๊ทธ์—์„œ๋„ ์ž˜ ๋‚˜์™€์žˆ์œผ๋‹ˆ ๋‚˜๋Š” ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•œ ํ•ต์‹ฌ์ ์ธ ๋ถ€๋ถ„๋งŒ ์ •๋ฆฌํ•˜๋ ค ํ•œ๋‹ค.

 

CRAFT ๋ชจ๋ธ์˜ ํ•ต์‹ฌ

  • CRAFT ๋ชจ๋ธ์€ ํ…์ŠคํŠธ ๊ฒ€์ถœ์„ ์œ„ํ•ด ๋‹จ์–ด bbox๋ฅผ ๋ฐ”๋กœ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋ฌธ์ž์˜ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” region score, ๋ฌธ์ž๊ฐ„ ๊ฑฐ๋ฆฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” affinity score๋ฅผ ์˜ˆ์ธก
  • ์ด๋ฅผ ์œ„ํ•ด์„œ๋Š” character-level annotation์ด ํ•„์š”ํ•œ๋ฐ ๋ฌธ์ž ํ•˜๋‚˜ ํ•˜๋‚˜ bbox๋ฅผ ๋งŒ๋“œ๋Š” ์ž‘์—…์€ ์ƒ๊ฐ๋งŒ ํ•ด๋„ ๋”์ฐํ•˜๊ฒŒ ์˜ค๋ž˜๊ฑธ๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— pseudo-GT๋ฅผ ์ƒ์„ฑํ•ด์„œ ํ•™์Šตํ•˜๋Š” weakly-supervised learning ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉ
  • Character ๋‹จ์œ„ bbox๊ฐ€ ์กด์žฌํ•˜๋ฉด ์œ„์™€ ๊ฐ™์ด region score์™€ affinity score๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์šฉ
  • ๋ฌผ๋ก  ํŠน์ • character ๋“ค์˜ ์กฐํ•ฉ์ด ํ•˜๋‚˜์˜ ๋‹จ์–ด๋ผ๋Š” ์ •๋ณด๋Š” ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ๋จ - ์œ„์˜ ๊ฒฝ์šฐ p, e, a, c, e๊ฐ€ ๋ชจ์—ฌ peace๋ผ๋Š” ํ•œ ๋‹จ์–ด๋ผ๋Š” ์ •๋ณด ํ•„์š” (์—„๋ฐ€ํžˆ ๋”ฐ์ง€๋ฉด ํ…์ŠคํŠธ ์ •๋ณด๋Š” ํ•„์š”์—†๊ณ , ํŠน์ • character bbox๊ฐ€ ๋ชจ์—ฌ ํ•˜๋‚˜์˜ ๋‹จ์–ด๋ฅผ ์ด๋ฃฌ๋‹ค๋Š” ์ •๋ณด ํ•„์š”)

 

 

CRAFT ๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •


  1. Character-level GT ๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ Interim model ์„ ํ•™์Šต (Train with Synthetic Image)
  2. Interim model ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด word-level annotation ๋งŒ ๋˜์–ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ character level pseudo-GT annotation data๋ฅผ ์ƒ์„ฑ (Generate Pseudo-GT)
  3. Character-level GT์™€ ์ƒ์„ฑํ•œ pseudo-GT๋กœ ํ•จ๊ป˜ ๋ชจ๋ธ์„ ํ•™์Šต. Pseudo-GT๋Š” ์ •ํ™•ํ•œ GT๋Š” ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์— character ๊ฐœ์ˆ˜๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ–ˆ๋Š”์ง€์— ๋”ฐ๋ผ confidence score ๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ ํ•™์Šต (weakly supervised learning) (Train with Real Image + Train with Synthetic Image)

 

* ์‚ฌ์‹ค์ƒ 2, 3๋ฒˆ ๋‹จ๊ณ„๋Š” ๋™์‹œ์— ์ง„ํ–‰. 3๋ฒˆ ๋‹จ๊ณ„์—์„œ๋Š” Real data๋งŒ์„ ์ด์šฉํ•ด์„œ ํ•™์Šตํ•  ์ˆ˜๋„ ์žˆ๊ณ , Synthetic + Real ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•  ์ˆ˜๋„ ์žˆ์Œ. 

 

* ์ฃผ์˜์‚ฌํ•ญ : ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ Synthetic Image, Real Image ๋กœ ํ‘œํ˜„ํ–ˆ๋Š”๋ฐ ์‚ฌ์‹ค ์ •ํ™•ํ•˜๊ฒŒ ๋งํ•˜๋ฉด Synthetic Image ๋Š” character-level GT๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ๋œปํ•˜๊ณ  Real Image๋Š” word-level GT๋งŒ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ๋œปํ•œ๋‹ค. ๋‹น์—ฐํžˆ Synthetic Image๋Š” ํ•ฉ์„ฑ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— character ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€๋งŒ, Real Image์˜ ๊ฒฝ์šฐ word-level GT๋งŒ ๊ฐ€์ง€๊ณ  ์žˆ์„ ์ˆ˜๋„ ์žˆ๊ณ  character-level GT๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์„ ์ˆ˜๋„ ์žˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹๋งŒ ์ƒ๊ฐํ•˜๋Š” ๊ฒฝ์šฐ ๋…ผ๋ฌธ์˜ ํ‘œํ˜„์ด ๋งž์ง€๋งŒ ํ˜„์‹ค์˜ ๊ฒฝ์šฐ์—” ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ํ—ท๊ฐˆ๋ฆฌ์ง€ ์•Š๊ธฐ๋ฅผ...

 

 

Train with Synthetic Image

  • ๊ฐœ์ˆ˜๊ฐ€ ์ ์€ character level annotation ์ด ๋˜์–ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋กœ Interim model ์„ ํ•™์Šต
  • Pseudo-GT๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ pre-train์„ ์ง„ํ–‰ํ•˜๋Š” ๋‹จ๊ณ„๋กœ ๋ณผ ์ˆ˜ ์žˆ์Œ
  • ์ด ๋‹จ๊ณ„์—์„œ ์–ด๋А์ •๋„ ํ…์ŠคํŠธ์˜ ๋ฌธ์ž ์œ„์น˜(region score)์™€ ๋ฌธ์ž๊ฐ„ ๊ฑฐ๋ฆฌ(affinity score)๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด ๋†”์•ผ ์ •์ƒ์ ์ธ ํ•™์Šต์ด ๊ฐ€๋Šฅ
    • Interim ๋ชจ๋ธ์ด region, affinity score๋ฅผ ์—‰ํ„ฐ๋ฆฌ๋กœ ์˜ˆ์ธกํ•˜๋ฉด ์ดํ›„์— ์ƒ์„ฑํ•˜๋Š” pseudo-GT๋Š” ๋” ์—‰๋ง์ผํ…Œ๋‹ˆ

 

Generate Pseudo-GT
  • Pseudo-GT๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด word-level annotation(๋‹จ์–ด bbox)์™€ ํ…์ŠคํŠธ ์ •๋ณด(์—„๋ฐ€ํžˆ ๋งํ•˜๋ฉด ๋‹จ์–ด๊ฐ€ ๋ช‡ ๊ฐœ์˜ character ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ์ •๋ณด)๋Š” ์žˆ์–ด์•ผ๋จ
  • Synthetic Image(with GT)๋กœ ํ•™์Šตํ•œ Interim model ์˜ inference ๊ฒฐ๊ณผ๋ฅผ pseudo-GT ๋กœ ํ™œ์šฉ
  • Interim model ๋กœ ์ƒ์„ฑํ•œ ๊ฒฐ๊ณผ๋ฅผ ๊ทธ๋Œ€๋กœ label ๋กœ ์“ฐ๊ธฐ์—๋Š” ์˜ค์ฐจ๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ์˜ˆ์ธกํ•œ character ๊ฐœ์ˆ˜์™€ ์‹ค์ œ character ๊ฐœ์ˆ˜์— ๋”ฐ๋ฅธ confidence score ๋ฅผ ๋ฐ˜์˜
    • e.g. 5๊ฐœ์˜ ๊ธ€์ž๋กœ ๊ตฌ์„ฑ๋œ ๋‹จ์–ด๋ฅผ 5๊ฐœ๋กœ ์˜ˆ์ธกํ•œ ๊ฒฝ์šฐ → confidence score = 5/5, 3๊ฐœ๋กœ ์˜ˆ์ธกํ•œ ๊ฒฝ์šฐ → confidence score= 3/5
    • ๋งŒ์•ฝ confidence score < 1/2 ์ธ ๊ฒฝ์šฐ ๋‹จ์–ด๋ฅผ ๋™์ผํ•œ ์‚ฌ์ด์ฆˆ์˜ ์นธ์œผ๋กœ ์ž˜๋ผ์„œ character bbox gt ๋กœ ์‚ฌ์šฉ

 

Train with Real Image & Train with Synthetic Image

์„ค๋ช…์„ ์œ„ํ•ด Pseudo-GT ์ƒ์„ฑ๊ณผ ํ•™์Šต ๋‹จ๊ณ„๋ฅผ ๋‚˜๋ˆ„์–ด ๋†จ์ง€๋งŒ, ์‹ค์ œ๋กœ๋Š” ๋™์‹œ์— ์ง„ํ–‰

  • ์ƒ์„ฑํ•œ pseudo-GT ์™€ ๊ธฐ์กด GT data ๋ฅผ ๋ชจ๋‘ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต.
    • Character-level annotation์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์…‹ ->  GT ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต 
    • Word-level annotation์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์…‹ -> Pseudo-GT ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต
    • CRAFT ์˜คํ”ผ์…œ ํ•™์Šต ์ฝ”๋“œ๋Š” ๊ณต๊ฐœ๋˜์ง€ ์•Š์•˜์ง€๋งŒ, EasyOCR์—์„œ ๊ณต๊ฐœํ•œ CRAFT ํ•™์Šต ์ฝ”๋“œ๋ฅผ ๋ณด๋ฉด GPU๋ฅผ ๋ฐ˜๋ฐ˜ ๋‚˜๋ˆ„์–ด ํ•œ ์ชฝ์€ GT ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ๋‹ค๋ฅธ ํ•œ ์ชฝ์€ Pseudo-GT๋ฅผ ์ƒ์„ฑํ•˜๊ณ  weakly-supervised learning์„ ์ง„ํ–‰
  • pseudo-GT ๋Š” confidence score ์ ์šฉ

 

์‹คํ—˜ ๊ฒฐ๊ณผ

  • ํ•™์Šต์„ ์ง„ํ–‰ํ•  ์ˆ˜๋ก character์˜ ์œ„์น˜๋ฅผ ์ฐพ๋Š” region score์˜ ํ‘œํ˜„๋ ฅ์ด ์ข‹์•„์ง€๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Œ
  • pre-train ๋‹จ๊ณ„์ธ Interim ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ณผ์ •์—์„œ ์–ด๋А์ •๋„ ์ข‹์€ ํ‘œํ˜„๋ ฅ์„ ๊ฐ€์ ธ์•ผ ์„ฑ๊ณต์ ์œผ๋กœ weakly supervised learning์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Œ

 

 

ํ•œ๊ตญ์–ด ์‹คํ—˜ ๊ฒฐ๊ณผ

ํ•œ๊ตญ์–ด๋กœ๋„ ํ•™์Šต์„ ์ง„ํ–‰ํ•ด๋ณด๋ฉด region score์™€ affinity score๋ฅผ ๊ฝค ์ž˜ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

๋ฐ˜์‘ํ˜•

'๐Ÿ› Research > OCR' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels  (0) 2023.03.12
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis  (0) 2023.03.12
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Data Augmentation for Scene Text Recognition  (0) 2023.03.11
[์—ฐ๊ตฌ ์†Œ๊ฐœ] ๋ฌธ์„œ ์ด๋ฏธ์ง€ ๊ทธ๋ฆผ์ž์ œ๊ฑฐ / ๋ฌธ์„œ OCR ๊ฒฐ๊ณผ๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด  (0) 2022.12.20
[์˜คํ”ˆ ์†Œ์Šค] EasyOCR ํ…์ŠคํŠธ ๊ฒ€์ถœ/์ธ์‹ AI ๋ชจ๋ธ์„ ๋ฌด๋ฃŒ๋กœ ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•ด๋ณด์ž  (0) 2022.12.16
'๐Ÿ› Research/OCR' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels
  • [๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
  • [๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Data Augmentation for Scene Text Recognition
  • [์—ฐ๊ตฌ ์†Œ๊ฐœ] ๋ฌธ์„œ ์ด๋ฏธ์ง€ ๊ทธ๋ฆผ์ž์ œ๊ฑฐ / ๋ฌธ์„œ OCR ๊ฒฐ๊ณผ๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด
๋ญ…์ฆค
๋ญ…์ฆค
AI ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ
    ๋ฐ˜์‘ํ˜•
  • ๋ญ…์ฆค
    CV DOODLE
    ๋ญ…์ฆค
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
  • ๊ณต์ง€์‚ฌํ•ญ

    • โœจ About Me
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (198)
      • ๐Ÿ“– Fundamentals (33)
        • Computer Vision (9)
        • 3D vision & Graphics (6)
        • AI & ML (15)
        • NLP (2)
        • etc. (1)
      • ๐Ÿ› Research (64)
        • Deep Learning (7)
        • Image Classification (2)
        • Detection & Segmentation (17)
        • OCR (7)
        • Multi-modal (4)
        • Generative AI (6)
        • 3D Vision (2)
        • Material & Texture Recognit.. (8)
        • NLP & LLM (11)
        • etc. (0)
      • ๐ŸŒŸ AI & ML Tech (7)
        • AI & ML ์ธ์‚ฌ์ดํŠธ (7)
      • ๐Ÿ’ป Programming (85)
        • Python (18)
        • Computer Vision (12)
        • LLM (4)
        • AI & ML (17)
        • Database (3)
        • Apache Airflow (6)
        • Docker & Kubernetes (14)
        • ์ฝ”๋”ฉ ํ…Œ์ŠคํŠธ (4)
        • C++ (1)
        • etc. (6)
      • ๐Ÿ’ฌ ETC (3)
        • ์ฑ… ๋ฆฌ๋ทฐ (3)
  • ๋งํฌ

  • ์ธ๊ธฐ ๊ธ€

  • ํƒœ๊ทธ

    OCR
    GPT
    multi-modal
    Computer Vision
    Python
    Image Classification
    CNN
    OpenCV
    segmentation
    OpenAI
    LLM
    ๊ฐ์ฒด๊ฒ€์ถœ
    object detection
    ๊ฐ์ฒด ๊ฒ€์ถœ
    nlp
    deep learning
    ํŒŒ์ด์ฌ
    AI
    pandas
    Text recognition
    3D Vision
    airflow
    ๋„์ปค
    pytorch
    ์ปดํ“จํ„ฐ๋น„์ „
    ๋”ฅ๋Ÿฌ๋‹
    material recognition
    VLP
    ํ”„๋กฌํ”„ํŠธ์—”์ง€๋‹ˆ์–ด๋ง
    ChatGPT
  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
๋ญ…์ฆค
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Character Region Awareness for Text Detection / CRAFT / ํ…์ŠคํŠธ ๊ฒ€์ถœ
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”