Python Pandas Iteration: Types and Examples
In Python, iteration is often used to process data in a Pandas DataFrame. There are several ways to iterate over a Pandas DataFrame, each with its own advantages and disadvantages. This tutorial discusses the different types of iteration in Pandas, along with examples of how to use them.
What is Iteration in Python Pandas?
Iteration in Pandas is a process that allows you to iterate over sequences like lists or strings in a similar way to iterating over rows or columns of a DataFrame or Series in Pandas. With iteration, you can perform operations on each element without having to access each element individually.
Types of iteration in Pandas
The following are some of the most common types of iteration in Pandas:
iterrows()
: This method returns an iterator that yields each row in a DataFrame as a tuple of (index, Series) pairs. TheSeries
object contains the values for the row.itertuples()
: This method returns an iterator that yields each row in a DataFrame as a namedtuple. The namedtuple contains the values for the row, along with the row's index.iteritems()
: This method returns an iterator that yields each column in a DataFrame as a (column name, Series) pair. TheSeries
object contains the values for the column.itervalues()
: This method returns an iterator that yields the values for each column in a DataFrame.
Iterating with .iterrows()
The .iterrows()
method is one of the most commonly used methods to iterate over DataFrame rows. It yields an index for each row and the row data as a Series.
Example:
import pandas as pd
df = pd.DataFrame({
'Name': ['John', 'Doe', 'Peter'],
'Age': [25, 28, 24]
})
for index, row in df.iterrows():
print(row['Name'], row['Age'])
Output:
# John 25
# Doe 28
# Peter 24
Here, the row data is printed for each iteration. The index and row are being printed for each iteration in the DataFrame.
Iterating with .itertuples()
The .itertuples()
method returns an iterator yielding index and row data for each row. The row data is returned as a named tuple, which can be quicker than .iterrows()
. Named tuple access is more efficient as you can access the data by attribute rather than by dictionary keys.
Example:
for row in df.itertuples():
print(row.Name, row.Age)
Output:
# John 25
# Doe 28
# Peter 24
In this case, row.Name and row.Age are more efficient to access than their counterparts in the previous example.
Iterating with .iteritems()
The .iteritems()
method is used to iterate over DataFrame columns instead of rows. It yields a tuple with the column name and the column data as a Series.
Example:
for label, content in df.iteritems():
print(label)
print(content)
Output:
# Name
# 0 John
# 1 Doe
# 2 Peter
# Age
# 0 25
# 1 28
# 2 24
Iterating with .itervalues()
The itervalues()
method is not directly applicable to Pandas DataFrame. It's commonly used with dictionaries in Python. The itervalues()
function returns an iterator of the dictionary's values, allowing you to iterate over the values of a dictionary.
In Python 3.x, itervalues()
has been replaced by values()
. Here's an example of how it's used:
dict = {'Name': 'Zophie', 'Species': 'cat', 'Age': '7'}
for value in dict.values():
print(value)
Output:
# Zophie
# cat
# 7
Advantages and Disadvantages of each type of iteration
The iterrows()
method is the most versatile type of iteration in Pandas. It can be used to iterate over any DataFrame, regardless of its size or structure. However, it can be less efficient than other types of iteration for large DataFrames.
The itertuples()
method is more efficient than iterrows()
for large DataFrames, but it can only be used if the DataFrame has a fixed number of columns.
The iteritems()
and itervalues()
methods are the least efficient types of iteration, but they can be used to iterate over any DataFrame, regardless of its size or structure.
When to use each type of iteration
The best type of iteration to use depends on the specific task you are trying to accomplish. If you need to iterate over a DataFrame and access the values for each row, then iterrows()
is the best option. If you need to iterate over a DataFrame and access the values for each column, then iteritems()
or itervalues()
are the best options. If you need to iterate over a large DataFrame and performance is important, then itertuples()
is the best option.