The Pandas library in Python provides multi-data type analysis with lots of built-in classes and functions. The Numpy library on the other hand provides multi-dimensional analysis of numeric data. Both libraries provide a robust tool for two-dimensional data analysis. In this article, we will discuss the various methods we can use to convert a Pandas DataFrame to a Numpy Array.
But before that, let’s revise both of them.
What is the Pandas Dataframe?
The Pandas DataFrame is a two-dimensional, size-mutable, and potentially miscellaneous type of data structure present in tabular form. It consists of labeled axes (rows and columns) and offers a wide range of functionalities for data manipulation, cleaning, and analysis.
Let us see an example:
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'Los Angeles', 'San Francisco', 'Chicago']} df = pd.DataFrame(data) # Display the DataFrame print('DataFrame:\n', df)
Output:
DataFrame:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 San Francisco
3 David 40 Chicago
What is the Numpy Array in Python?
NumPy stands for Numerical Python and is a fundamental package for scientific computing in Python. It provides powerful N-dimensional array objects, which are essential for performing mathematical and logical operations on large datasets efficiently.
import numpy as np # List of lists list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # Convert to NumPy array numpy_array = np.array(list_of_lists) # Display the NumPy array print("NumPy Array: \n", numpy_array)
Output
NumPy Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
How to Convert Pandas DataFrame to Numpy Array?
The major difference between DataFrames and NumPy would be that NumPy arrays have the same type of data structures which is why they are known as homogeneous sets of data structures. There are many more differences between both the libraries, which you can learn in this guide for Pandas vs Numpy.
Here are various ways to convert a Pandas Dataframe to NumPy array:
Method 1) Using to_numpy() Method
The simplest method to convert a Pandas DataFrame to a NumPy array is by using the to_numpy() method. The to_numpy function is provided by the Pandas library.
The following code will help you perform this conversion:
import pandas as pd # Create the DataFrame data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) # To convert this DataFrame to a NumPy array, we can use the to_numpy() method array = df.to_numpy()
Output:
[[1 4 7]
[2 5 8]
[3 6 9]]
Now, if we want to convert only a specific column to a NumPy Array we can do that by specifying the column. Let us see an example for that:
import pandas as pd data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]} # Create the DataFrame df = pd.DataFrame(data) # Specify the columns of the df array = df[['A', 'B']].to_numpy()
Output:
[[1 4]
[2 5]
[3 6]]
We can also convert both rows and columns of a DataFrame to a NumPy Array. We can do this by using the Pandas iloc indexer. The following examples show that:
import pandas as pd # Make the DataFrame data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) # Using the iloc indexer with to_numpy() method array = df.iloc[[0, 1, 2], [0, 1, 2]].to_numpy()
Output:
[[1 4 7]
[2 5 8]
[3 6 9]]
Method 2) Using values Attribute
Another method to convert a Pandas DataFrame to a NumPy array is by accessing the values attribute of the DataFrame. This is an easy and quick way for this type of conversion. Here is an example:
import pandas as pd # Creating the DataFrame data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]} df = pd.DataFrame(data) # Using the values attribute array = df.values
Output:
[[1 4 7]
[2 5 8]
[3 6 9]]
Conclusion
In this article, we have explored different methods to convert a Pandas DataFrame to a NumPy array. The to_numpy() method, provided by Pandas, allows us to convert the entire DataFrame, a specific column, or both of the rows and columns to a NumPy array easily. conversion. Another popular conversion is to get a dataframe from a series, which you must learn to crack technical interviews.