Types of DataFrame Attributes in Pandas
In this tutorial, we'll focus on various types of attributes of DataFrame in Pandas and provide examples for each one. To understand these attributes, you need to know what a DataFrame is. In simple terms, a DataFrame is a two-dimensional labeled data structure with columns potentially of different types.
Let's start by importing the Pandas library and creating a simple DataFrame:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'London', 'Berlin', 'Sydney']}
df = pd.DataFrame(data)
This DataFrame df
now consists of names, ages, and cities of four individuals.
List of DataFrame Attributes
List of different types of dataframe attributes given below:
1. dtypes: This attribute returns the data type of each column in the DataFrame.
Example:
print(df.dtypes)
# output
# Name object
# Age int64
# City object
# dtype: object
2. columns: It returns the name of all columns present in the DataFrame.
Example:
print(df.columns)
# output
# Index(['Name', 'Age', 'City'], dtype='object')
3. shape: This attribute gives the number of rows and columns present in the DataFrame. It returns a tuple representing the dimensionality.
Example:
print(df.shape)
# output
# (4, 3)
4. size: It returns the total number of elements in the DataFrame, i.e., it is the product of the number of rows and the number of columns.
Example:
print(df.size)
# output
# 12
5. index: This attribute provides the index (row labels) of the DataFrame.
print(df.index)
# output
# RangeIndex(start=0, stop=4, step=1)
6. values: It returns the data of the DataFrame as a NumPy array.
Example:
print(df.values)
# output
# [['John' 28 'New York']
# ['Anna' 24 'London']
# ['Peter' 35 'Berlin']
# ['Linda' 32 'Sydney']]
7. empty: This attribute returns a boolean indicating whether the DataFrame is empty (True) or not (False).
Example:
print(df.empty)
# output
# False
If our DataFrame didn't have any data in it, this would return True
.
8. ndim: It returns an integer indicating the number of axes/array dimensions, which is 1 for Series and 2 for DataFrame.
Example:
print(df.ndim)
# output
# 2
This tells us that our DataFrame df
has 2 dimensions.
9. axes: This attribute returns a list representing the axes of the DataFrame. For a DataFrame, the axes are its index (rows) and columns.
Example:
print(df.axes)
# output
# [RangeIndex(start=0, stop=4, step=1), Index(['Name', 'Age', 'City'], dtype='object')]
The output shows the row index range from 0 to 4 and the columns 'Name', 'Age', 'City' as the DataFrame's axes.
10. info(): This is a function rather than an attribute, but it's handy for getting a concise summary of the DataFrame including the number of non-null entries in each column.
Example:
df.info()
This will output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 4 non-null object
1 Age 4 non-null int64
2 City 4 non-null object
dtypes: int64(1), object(2)
memory usage: 224.0+ bytes
This shows us that our DataFrame has 4 non-null entries in each column.
Each of these attributes serves different purposes and can be very helpful while dealing with large data sets.