[Pandas] DataFrame : Basic Attributes and Exploration Methods

728x90

pandas의 DataFrame객체는 2차원 데이터 구조(2D tabular structure)로, 데이터 분석에서 가장 자주 사용되는 객체임.

일반적으로 데이터에서 수백 ~ 수십만의 row (case) 및 column (feature, attribute)이 존재
일부 데이터를 출력하거나 통계치로서 데이터를 살펴보는 과정 필요. ← Descriptive Statistics

이같은 DataFrame 객체의 구조 및 내용을 빠르게 파악하기 위한 주요 attributes와 exploration methods를 소개한다.

1. DataFrame 기본 속성 (Attributes)

DataFrame 객체는 NumPy 배열처럼 몇 가지 기초 속성을 바로 확인할 수 있음
shape, ndim, dtype 등을 손쉽게 확인 가능

import pandas as pd

# 예제 DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 35, 40],
    "City": ["Seoul", "Busan", "Incheon", "Daegu"]
}
df = pd.DataFrame(data, dtype="string")

# 주요 속성 확인
print("Shape:", df.shape)        # (행, 열)
print("ndim:", df.ndim)          # 차원 수 (항상 2)
print("Size:", df.size)          # 전체 원소 개수
print("dtypes:\n", df.dtypes)    # 각 column의 dtype
print("Index:", df.index)        # 행 인덱스 객체
print("Columns:", df.columns)    # 열 이름

2. 구조 확인 메서드: info()

데이터의 전반적 구조 요약 제공
행 개수, 열 개수, 각 열의 데이터 타입, 결측치 여부 등을 확인 가능

df.info()

출력은 다음과 같음:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    4 non-null      string 
 1   Age     4 non-null      int64  
 2   City    4 non-null      string 
dtypes: int64(1), string(2)

<class 'pandas.core.frame.DataFrame'>
- 현재 객체가 pandas.DataFrame 클래스임을 의미
RangeIndex: 4 entries, 0 to 3
- 행 인덱스가 RangeIndex(0부터 시작하는 연속 정수)
- 전체 행 개수는 4개 (0, 1, 2, 3)
Data columns (total 3 columns):
- 총 3개의 열(column) 존재
column(열)별 상세 정보
- # : 열(column) 번호 (0부터 시작)
- Column : 열 이름
- Non-Null Count : 결측치가 아닌 값 개수
- Dtype : 열의 데이터 타입(dtype)
- 예시 해석:
  - Name: 4개 값 모두 non-null, dtype은 string
  - Age: 4개 값 모두 non-null, dtype은 int64
  - City: 4개 값 모두 non-null, dtype은 string
dtypes: int64(1), string(2)
- 전체 열의 dtype 종류 요약
- int64 타입 열 1개, string 타입 열 2개

3. 통계 요약 메서드: describe()

수치형 데이터(numeric columns)의 기본 descriptive statistics (기술통계치) 요약
- 평균(mean), 표준편차(std), 최소/최대값(min/max), 사분위수(percentiles) 제공
include="all" 옵션을 주면 문자열(and 범주형) 열까지 포함 가능
- 문자열 또는 범주형에 대한 정보는 다음과 같음
  - unique: 고유 값 개수
  - top: 가장 많이 등장하는 값 (최빈값)
  - freq: 최빈값의 빈도

print(df.describe())                   # 기본: 수치형만
print(df.describe(include="all"))      # 모든 열 포함
print(df["Name", "City"].describe())   # 특정 단일 열만 살펴볼 수도 있음
print(df[["Name", "City"]].describe()) # 특정 여러 열들만 살펴볼 수도 있음

df[] 에서 square brakcets 안에는 column label을 item으로 가지는 list객체 또는 column label 만 들어가야 함.

https://dsaint31.tistory.com/673#1.%20Statistics%20(%ED%86%B5%EA%B3%84)%EC%9D%98%20%EC%A2%85%EB%A5%98-1-1

[Math] 기본 Term: Statistics

기본 Term: Statistics기술 통계와 추론 통계의 주요 개념들, 그리고 관련 용어들에 대한 소개1. Statistics (통계)의 종류1-1. Descriptive Statistics (기술 통계)어떤 data set을 statistics(통계치)를 통해 "기술"해

dsaint31.tistory.com

4. 데이터 일부 확인: head(), tail()

head(n) : 앞에서부터 n개의 행 확인 (기본값 5)
tail(n) : 뒤에서부터 n개의 행 확인 (기본값 5)

print(df.head(2))  # 앞 2개 행
print(df.tail(2))  # 뒤 2개 행

5. Iteration

2025.09.29 - [Python/pandas] - Pandas - Iteration

Pandas - Iteration

DataFrame의 record(or row)를 순회(iteration)하는 방법:1. iterrows()각 행을 (index, Series) 형태로 반환:import pandas as pddf = pd.DataFrame({ "name": ["Kim", "Lee", "Park"], "age": [28, 34, 29]})for idx, row in df.iterrows(): print(idx, row

ds31x.tistory.com

같이 보면 좋은 자료들

https://blog.naver.com/dsaint31/224030800002

Pandas의 DataFrame 사용법

DataFrame pandas의 핵심 데이터 구조 엑셀 시트처럼 행(row)과 열(column)으로 구성된 labeled 2차원 ta...

blog.naver.com