Python provides Pandas library with various tools and functions that can make several tasks easier. It also allows you to work with all sorts of files and formats. One such format is TSV (Tab Separated Values), which is commonly used to store tabular data. In this article, we will learn various methods to read TSV files in Pandas.
Before we dive into how to do it, let’s understand what exactly is a TSV file.
What is a TSV File?
TSV stands for Tab-Separated Values. TSV files are plain text files where data is organized in rows and columns. Each field or value in the file is separated by a tab character. A tab is nothing but a defined amount of space between the characters.
Let us see an example:
In the example above, each line represents a different person, and the details about each person (name, age, city) are organized in columns, separated by tabs. TSV files are easy for both humans and computers to read and understand. They’re often used to exchange data between different programs or to store data in a simple format.
TSV files are commonly used for exchanging data between different applications that support tabular data.
How to Read TSV Files in Pandas?
Now that we have learned about what a TSV format file is, let us try to create it in Python. Then, we will learn about two methods to load TSV files in Pandas.
Making a TSV format file is rather easy. Let us look at the code:
import csv # Sample data data = [ ['Name', 'Age', 'City'], ['John', 35, 'New York'], ['Smith', 20, 'London'], ['Johnson', 18, 'Paris'] ] # Let us set the file name file_name = 'sample.tsv' # Using open with 'w' to write to TSV file with open(file_name, 'w', newline='') as tsvfile: writer = csv.writer(tsvfile, delimiter='\t') writer.writerows(data)
In this example, we are using csv.writer configured with delimiter=’\t’. This uses a tab character as the separator. The newline=” parameter is used to ensure that new lines in the data are handled properly.
Make sure to customize the data list according to your requirements. The resulting TSV file will be created in the same directory as your Python script with the specified name (sample.tsv in this case).
Output:
Name Age City
John 35 New York
Smith 20 London
Johnson 18 Paris
Each row represents a person, and each column has a specific piece of information (name, age, city). The tabs help keep everything organized, making it easy for both people and computers to understand the data. TSV files are like a structured way of writing down information in a list or table, making it clear and easy to work with.
Let us now look at the different methods to read them in Pandas Python.
Using read_csv() Function to read TSV Files in Pandas
We can use the read_csv() function to read tsv files in Pandas. We can simply define the sep=’\t’. This will be in a format similar to the TSV.
Despite its name which means that it is useful for CSV files, this function can handle various other file formats, including TSV files. Let’s take an examples to see how it works:
import pandas as pd # Read the csv file with sep = '\t' df = pd.read_csv('sample.tsv', sep='\t') # Display the updated TSV format file print('TSV file:\n', df)
Output:
Name Age City
John 35 New York
Smith 20 London
Johnson 18 Paris
Using read_table() Function to read TSV Files
We can also use the read_table function provided by the Pandas library of Python. This function is specifically designed to handle tabular data with a customizable delimiter.
The function automatically detects the delimiter (tab) and parses the file accordingly. The resulting DataFrame will contain the data from the TSV file. Let us take a look at an example:
import pandas as pd # Use the read_table function to read the tsv file df = pd.read_table('sample.tsv') # Display the updated TSV format file print('TSV file:\n', df)
Output:
Name Age City
John 35 New York
Smith 20 London
Johnson 18 Paris
Overall, Pandas’ ability to handle various file formats, including TSV, makes it a valuable tool for data scientists.
Conclusion
In this article, we have learned different methods we can use to read a tsv file in Pandas. These files can be very useful in certain data analysis needs and requirements. We explored two different methods using the read_csv() and read_table() functions from the Pandas library. To test more of your skills, here are some interesting Pandas exercises for beginners to try.