Jugal kishore

Jun 14, 2025 • 2 min read

📊 Mastering Pandas: A Practical Guide to Reading, Cleaning, and Aggregating Data

📊 Mastering Pandas: A Practical Guide to Reading, Cleaning, and Aggregating Data

Pandas is a powerful data manipulation library in Python that simplifies working with structured data. In this article, we’ll walk through three crucial topics that every data enthusiast or professional must master:

  1. Reading, Writing, and Selecting Data with Pandas

  2. Data Cleaning and Handling Missing Values in Pandas

  3. Aggregation, Grouping, and Combining Data in Pandas

Let’s dive right in! 🚀


🔍 1. Reading, Writing, and Selecting Data with Pandas: Practical Guide

✅ Reading Data

The most common format for structured data is CSV (comma-separated values). Pandas provides the read_csv() function for that.

import pandas as pd

# Reading a CSV file
df = pd.read_csv("data.csv")
print(df.head())  # First 5 rows

You can also read from Excel, JSON, and SQL databases:

# Excel
df = pd.read_excel("data.xlsx")

# JSON
df = pd.read_json("data.json")

✅ Writing Data

Save your DataFrame to various formats using:

# Save to CSV
df.to_csv("output.csv", index=False)

# Save to Excel
df.to_excel("output.xlsx", index=False)

✅ Selecting Data

Accessing Columns

df['column_name']
df[['col1', 'col2']]

Accessing Rows

df.loc[0]     # By label/index
df.iloc[0]    # By position

Filtering Rows

# All rows where age > 25
df[df['age'] > 25]

🧹 2. Data Cleaning and Handling Missing Values in Pandas

✅ Identifying Missing Data

df.isnull().sum()

✅ Dropping Missing Data

df.dropna(inplace=True)  # Drop rows with any missing values

You can also drop rows/columns selectively:

df.dropna(subset=['column1'], inplace=True)

✅ Filling Missing Data

# Fill with a constant
df.fillna(0, inplace=True)

# Fill with mean of a column
df['salary'].fillna(df['salary'].mean(), inplace=True)

✅ Replacing Data

# Replace specific values
df.replace("N/A", pd.NA, inplace=True)

🧪 Example:

import numpy as np

data = {
    'name': ['Alice', 'Bob', 'Charlie', np.nan],
    'age': [25, np.nan, 30, 22],
    'salary': [50000, 60000, np.nan, 40000]
}

df = pd.DataFrame(data)
df.fillna({'name': 'Unknown', 'age': df['age'].mean(), 'salary': df['salary'].median()}, inplace=True)
print(df)

📊 3. Aggregation, Grouping, and Combining Data in Pandas Explained

✅ Aggregation

# Get summary statistics
df.describe()

# Mean of a column
df['salary'].mean()

✅ Grouping Data

# Group by department and calculate average salary
df.groupby('department')['salary'].mean()

You can also apply multiple aggregations:

df.groupby('department')['salary'].agg(['mean', 'max', 'min'])

✅ Combining DataFrames

Concatenation

pd.concat([df1, df2], axis=0)

Merging

pd.merge(df1, df2, on='employee_id', how='inner')

Joining (on index)

df1.join(df2, how='outer')

🧪 Example:

df_sales = pd.DataFrame({
    'store': ['A', 'B', 'C'],
    'sales': [1000, 1500, 2000]
})

df_region = pd.DataFrame({
    'store': ['A', 'B', 'C'],
    'region': ['North', 'East', 'West']
})

# Merge both DataFrames
df_merged = pd.merge(df_sales, df_region, on='store')
print(df_merged)

# Group by region and get total sales
print(df_merged.groupby('region')['sales'].sum())

✍️ Final Thoughts

Pandas is a must-know for any data analyst or backend developer dealing with structured data. Mastering these concepts — reading/writing data, cleaning it, and aggregating — will significantly boost your productivity and understanding of data pipelines.

Join Jugal on Peerlist!

Join amazing folks like Jugal and thousands of other people in tech.

Create Profile

Join with Jugal’s personal invite link.

1

15

0