[pandas] ํŠน์ • ์ปฌ๋Ÿผ์—์„œ ์ค‘๋ณต๋œ ๊ฐ’ ์ œ๊ฑฐ | drop_duplicates

2023. 11. 17. 08:08ยท๐Ÿ’ป Programming/Python
๋ฐ˜์‘ํ˜•

ํŒ๋‹ค์Šค ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ๊ฐ„ํ˜น ํŠน์ • ์ปฌ๋Ÿผ์— ์ค‘๋ณต๋œ ๊ฐ’์„ ์ œ๊ฑฐํ•˜๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค.

 

๋งŒ์•ฝ ์ค‘๋ณต๋œ ๊ฐ’์„ ๊ฐ€์ง„ ํ–‰ ์ค‘ ์ฒ˜์Œ ๋“ฑ์žฅํ•˜๋Š” ํ–‰์„ ์ œ์™ธํ•˜๊ณ  ๋‚˜๋จธ์ง€ ์ค‘๋ณต๋œ ํ–‰์„ ๋ชจ๋‘ ์ œ๊ฑฐํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

import pandas as pd

# ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ
data = {'column_name': [1, 2, 3, 1, 2, 4]}
df = pd.DataFrame(data)

# ์ค‘๋ณต๋œ ๊ฐ’์„ ๊ฐ€์ง„ ํ–‰ ์ค‘ ์ฒ˜์Œ ๋“ฑ์žฅํ•˜๋Š” ํ–‰์„ ์ œ์™ธํ•˜๊ณ  ๋‚˜๋จธ์ง€ ์ค‘๋ณต๋œ ํ–‰ ์ œ๊ฑฐ
df_no_duplicates = df.drop_duplicates(subset='column_name', keep='first')

# ๊ฒฐ๊ณผ ์ถœ๋ ฅ
print(df_no_duplicates)

# ํ–‰ ๊ฐœ์ˆ˜ ๋น„๊ต
print(len(df.values.tolist()))
print(len(df_no_duplicates.values.tolist()))
  • drop_duplicates ํ•จ์ˆ˜์˜ subset ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ์ค‘๋ณต์„ ํ™•์ธํ•  ์ปฌ๋Ÿผ์„ ์ง€์ •ํ•˜๊ณ , keep ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ์–ด๋–ค ์ค‘๋ณต๊ฐ’์„ ์œ ์ง€ํ• ์ง€๋ฅผ ์„ค์ •
  • drop_duplicates(subset='column_name', keep='first') ์‚ฌ์šฉ ์‹œ ์ฒ˜์Œ ๋‚˜ํƒ€๋‚œ ๊ฐ’๋งŒ ์œ ์ง€
    • keep='last'๋กœ ์„ค์ • ์‹œ ๋งˆ์ง€๋ง‰ ๋‚˜ํƒ€๋‚œ ๊ฐ’๋งŒ ์œ ์ง€
  • ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ๋ฆฌ์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ํ–‰ ๊ฐœ์ˆ˜๋ฅผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Œ

 

 

๋ฐ˜์‘ํ˜•

'๐Ÿ’ป Programming > Python' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[pandas] ํŠน์ • ์ปฌ๋Ÿผ์—์„œ ํŠน์ • ๋ฌธ์ž์—ด์ด ํฌํ•จ๋œ ํ–‰ ์ฐพ๊ธฐ | str.contains  (0) 2023.11.17
[pandas] ํŠน์ • ์ปฌ๋Ÿผ์˜ ๊ฐ’์ด ๊ณต๋ฐฑ์ธ ํ–‰์„ ์ œ์™ธํ•˜๋Š” ๋ฐฉ๋ฒ• | dropna  (0) 2023.11.17
[pandas] DataFrame ์„ค๋ช… | ๋ฐ์ดํ„ฐ ์กฐ์ž‘, ํ•„ํ„ฐ๋ง, ์‹œ๊ฐํ™”, ํ†ต๊ณ„ ๋ถ„์„  (0) 2023.11.16
[pandas] 2์ฐจ์› ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ | pd.DataFrame  (0) 2023.11.16
[python] ๊ตฌ๊ธ€ ๊ฒ€์ƒ‰ ์ด๋ฏธ์ง€ ํฌ๋กค๋ง/์Šคํฌ๋ž˜ํ•‘ํ•˜๊ธฐ  (0) 2023.08.15
'๐Ÿ’ป Programming/Python' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [pandas] ํŠน์ • ์ปฌ๋Ÿผ์—์„œ ํŠน์ • ๋ฌธ์ž์—ด์ด ํฌํ•จ๋œ ํ–‰ ์ฐพ๊ธฐ | str.contains
  • [pandas] ํŠน์ • ์ปฌ๋Ÿผ์˜ ๊ฐ’์ด ๊ณต๋ฐฑ์ธ ํ–‰์„ ์ œ์™ธํ•˜๋Š” ๋ฐฉ๋ฒ• | dropna
  • [pandas] DataFrame ์„ค๋ช… | ๋ฐ์ดํ„ฐ ์กฐ์ž‘, ํ•„ํ„ฐ๋ง, ์‹œ๊ฐํ™”, ํ†ต๊ณ„ ๋ถ„์„
  • [pandas] 2์ฐจ์› ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ | pd.DataFrame
๋ญ…์ฆค
๋ญ…์ฆค
AI ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ
    ๋ฐ˜์‘ํ˜•
  • ๋ญ…์ฆค
    CV DOODLE
    ๋ญ…์ฆค
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
  • ๊ณต์ง€์‚ฌํ•ญ

    • โœจ About Me
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (202)
      • ๐Ÿ“– Fundamentals (33)
        • Computer Vision (9)
        • 3D vision & Graphics (6)
        • AI & ML (15)
        • NLP (2)
        • etc. (1)
      • ๐Ÿ› Research (67)
        • Deep Learning (7)
        • Image Classification (2)
        • Detection & Segmentation (17)
        • OCR (7)
        • Multi-modal (4)
        • Generative AI (8)
        • 3D Vision (3)
        • Material & Texture Recognit.. (8)
        • NLP & LLM (11)
        • etc. (0)
      • ๐Ÿ› ๏ธ Engineering (7)
        • Distributed Training (4)
        • AI & ML ์ธ์‚ฌ์ดํŠธ (3)
      • ๐Ÿ’ป Programming (86)
        • Python (18)
        • Computer Vision (12)
        • LLM (4)
        • AI & ML (18)
        • Database (3)
        • Apache Airflow (6)
        • Docker & Kubernetes (14)
        • ์ฝ”๋”ฉ ํ…Œ์ŠคํŠธ (4)
        • C++ (1)
        • etc. (6)
      • ๐Ÿ’ฌ ETC (3)
        • ์ฑ… ๋ฆฌ๋ทฐ (3)
  • ๋งํฌ

  • ์ธ๊ธฐ ๊ธ€

  • ํƒœ๊ทธ

    segmentation
    ChatGPT
    AI
    OpenAI
    pytorch
    OpenCV
    ๊ฐ์ฒด๊ฒ€์ถœ
    nlp
    deep learning
    ๋„์ปค
    ์ปดํ“จํ„ฐ๋น„์ „
    3D Vision
    CNN
    ํ”„๋กฌํ”„ํŠธ์—”์ง€๋‹ˆ์–ด๋ง
    multi-modal
    ๋”ฅ๋Ÿฌ๋‹
    VLP
    LLM
    ํŒŒ์ด์ฌ
    pandas
    airflow
    material recognition
    object detection
    Computer Vision
    ml
    Python
    Text recognition
    generative ai
    OCR
    ๊ฐ์ฒด ๊ฒ€์ถœ
  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
๋ญ…์ฆค
[pandas] ํŠน์ • ์ปฌ๋Ÿผ์—์„œ ์ค‘๋ณต๋œ ๊ฐ’ ์ œ๊ฑฐ | drop_duplicates
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”