ADVERTISEMENT
ADVERTISEMENT

DataFrame Operations in Pandas

Learn about the operations on DataFrames using the pandas library in Python. First off, let's clear up what a DataFrame is. Imagine a DataFrame as a sort of spreadsheet, or a table, where you organize your data. Each row is an individual record, and each column represents a type of information. This makes handling your data easy and intuitive.

Operations on DataFrame

Understand the basic operations on DataFrame in Pandas.

1. Creating a DataFrame

The first step in working with DataFrames in pandas is knowing how to create one. Let's look at the simple code  below:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 33]}
df = pd.DataFrame(data)

We've just created a DataFrame with two columns, 'Name' and 'Age', using a dictionary

2. Accessing Data

Once you've created your DataFrame, you might want to access specific data within it. You can access the data in several ways:

By column: df['Name']

By row: df.loc[1] or df.iloc[1]

3. Modifying Data

You can also add or modify data within your DataFrame. Let's see how:

To add a new column: df['Gender'] = ['Male', 'Female', 'Male']

To modify an existing value: df.loc[1, 'Age'] = 25

4. Basic Statistical Operations

Pandas allows us to do various statistical operations on our DataFrame like calculating the mean, median, maximum, minimum, etc. These can be done with simple commands such as:

To calculate the mean age: df['Age'].mean()

To find the maximum age: df['Age'].max()

5. Handling Missing Data:

Missing data is a common issue in real-world datasets. Pandas provides several methods to handle missing data:

df.dropna(): This will remove any rows with missing data.

df.fillna(value): This will replace all missing data with a specified value.

6. Sorting and Grouping:

Often, we want to sort or group our data based on certain criteria. This is straightforward in pandas:

To sort by age: df.sort_values('Age')

To group by a column (e.g., 'Gender'): df.groupby('Gender')

7. Merging and Concatenating:

Sometimes, we need to combine different DataFrames. We can either merge them, which is similar to joining tables in SQL, or concatenate them:

Merge: pd.merge(df1, df2, on='common_column')

Concatenate: pd.concat([df1, df2])

 


ADVERTISEMENT

ADVERTISEMENT