ADVERTISEMENT
ADVERTISEMENT

Different Types of Data Structures in Pandas 

Pandas has two main types of data structures: Series and DataFrame. These data structures are built on top of NumPy, which means they are fast.

1. Series

A Series is a one-dimensional array-like object that can hold any data type (integers, strings, floating points, Python objects, etc.). It labels each data point with a unique identifier, which by default is a number from 0 to N (N being the length of the data - 1).

For example, imagine we have a series of four different fruits. In a pandas Series, this data will look something like this:

Index Item
0 Apple
1 Banana
2 Cherry
3 Blueberry

In the example above, each fruit is associated with a unique index (0 to 3).

2. DataFrame

A DataFrame is a two-dimensional table of data with rows and columns. Like a series, the rows are all labeled with a unique index. However, in a DataFrame, the columns are also labeled. DataFrames are great for representing real-world data because they allow you to store heterogeneous types of data (numeric, date-time, text, etc.) in the same table, all aligned by the same index.

For example, imagine we have a DataFrame that contains information about those same four fruits, such as their color and weight. In a pandas DataFrame, this data will look something like this:

Index Item Color Weight
0 Apple Red 150
1 Banana Yellow 120
2 Cherry Red 5
3 Blueberry Blue 1

In the example above, each fruit is associated with an index (0 to 3), and each attribute of the fruit (color, weight) is a labeled column.

Key Points about Pandas Data Structure

key points about pandas data structures:

  1. Two Main Data Structures: Pandas provides two key data structures - Series and DataFrame. A Series is a one-dimensional labeled array capable of holding any data type, while a DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

  2. Handling of Different Data Types: Both Series and DataFrames can hold various data types such as integer, float, string, and Python objects. A DataFrame can hold different types of data in each column.

  3. Data Alignment: One of the critical features of pandas data structures is the behavior of the arithmetic operations between objects with different indexes. Pandas automatically aligns data in calculations by the index labels.

  4. Handling Missing Data: Pandas data structures cater well to missing data. It represents missing or NA values using the np.nan object from NumPy.

  5. Manipulation and Transformation: Pandas data structures are mutable. They can be modified directly or transformed to derive new objects. You can add, remove, or update values. This makes pandas powerful for data wrangling and preprocessing.


ADVERTISEMENT

ADVERTISEMENT