We have already talked about GPT-4o’s capabilities for Web Development and Coding. Now we will explore another interesting sector out there that is binding together many operations and departments, i.e. Data Analysis. Data is at the heart of every operation! In this article, we will show you how to use GPT-4o for Data Analysis to get the most out of it!
Table of Contents
Using GPT-4o For Data Analysis
Utilizing GPT-4o’s natural language processing powers to comprehend, interpret, and work with data via a conversational interface is part of using it for data analysis. Here’s a guide on how to utilize GPT-4o for this purpose in an efficient manner:
Understand GPT-4o’s Capabilities
Before you get started with GPT-4o, it’s important to understand its capabilities which will be essential during the Data Analysis and extraction process. Here are a few things which the optimized GPT-4o can do and you also need to keep in mind.
- Natural Language Understanding: Interprets queries and instructions in plain language.
- Text Generation: Provides detailed explanations and insights.
- Pattern Recognition: Identifies patterns in data described textually.
- Basic Calculations: Performs simple mathematical operations.
- Summarization: Condenses data findings into coherent summaries.
- Data Formatting: Converts data into required formats, such as tables or lists.
By remembering these simple facts, you will be able to remember and understand when to use GPT-4o accordingly for your data.
Preparing Your Data
Of course, before you get started with Data Analysis using GPT-4o, your first task will be to prepare your dataset. Before engaging with GPT-4o, ensure your data is Clean, i.e. Free of errors and formatted consistently.
Ensure it is structured and Organized in a way that can be easily described (e.g., tables, lists). Also, try to make it accessible Either as text or in a simple format that can be pasted into the conversation.
Some examples of data formats that GPT-4o accepts are texts such as Plain text descriptions, and lists. Tabular files such as CSV, TSV, or formatted tables are also accepted. JSON files such as Structured JSON data can also be analyzed by GPT-4o.
For our experiment, we used a CSV file showing the layoff raw data of employees from the COVID-19 year 2019 to 2022. It’s a huge CSV file with more than 3000 rows.
Formulate Your Questions
Now that you have prepared your data, you must now prepare your questions. Questions as in what you want GPT-4o to do with your data. You can ask any sort of questions based on what you want to do, for example, descriptive analysis, predictive analysis, data visualization, data manipulation, and even statistical insights.
Let’s do all of these one by one:
Descriptive Analysis:
Here we uploaded our CSV file to GPT-4o and asked it this question.
“This is a CSV file containing employee layoff data from 2019 to 2022. Summarize the main trends in this data.“
The way GPT-4o responded will shock you. It took just a few seconds to analyze the huge dataset and provided us with a detailed point-by-point summarization of the employee layoff dataset. It summarized important information such as overall layoff trends, Geographical trends, a summary of layoff trends, and even much more.
Overall, we can say GPT-4o will overdo data summarization for you and provide you with details surrounding each and every aspect of your dataset. It can even drag out yearly trends for example sales and annual numerics, if your data happens to contain yearly figures.
Predictive Analysis:
Once you extract the valuable trends and insights from your dataset, you can ask GPT-4o to perform a predictive analysis based on these trends. Here’s what we did. We gave this prompt to GPT-4o.
“Now that you have summarised the main trends from this dataset, What can we infer about future layoffs based on this historical data?”
GPT-4o responded quite well again, providing detailed insights into the increasing trends in layoffs for the upcoming years. It even performed future inference and vulnerability analysis, stating the reasons and risks associated to more employee layoffs in the future, based on our dataset.
Now that I had the forecasting trends, I wanted to try another interesting thing. This time I wanted a Line Graph showing how the recession data will increase in the upcoming years till 2028. This is the prompt that I gave GPT-4o.
“The layoff data shows an increasing trend in layoffs, based on the trends shown, can you provide me a graph showing the increase in recession in the upcoming years till 2028?”
This is how GPT-4o responded. And I couldn’t say I was any less fascinated.
Thus you can see how well GPT-4o has analyzed the data and forecasted a line graph for me showing the future layoffs for the upcoming years based on the dataset trends. What impresses me most is that it perfectly encapsulated the graph labels to provide a better understanding.
This is a perfect example of Data Visualization and Data Prediction together in one go. You can utilize this tip for better insights into future outcomes with your data variables.
This shows a well-classified dataset is good enough for GPT-4o to analyze and perform prediction analysis. So, you can ask GPT-4o to provide future insights once it has already summarized key things from your dataset.
Data Visualization:
You already know that ChatGPT can transform your heavy chunks of tabular or excel data, into attractive graphical representations and even pie charts. This is also known as Data Visualization and is a highly important aspect when it comes to Data Analysis.
It is not always possible to analyze huge data by just figures and theoretical descriptions. Let’s see how you can ask GPT-4o to visualize your data so that you can understand it better.
The layoff dataset we used consists of a huge data categorized in the form of layoffs by company, layoffs by industry and also layoffs by country. Analyzing all of these is too hectic, so I asked GPT-4o to provide me with a Pie Chart showing the percentage of layoffs by country. This is the prompt that we gave:
“I need you to visualize the layoff data by country in the form of a pie chart.”
This is how GPT-4o responded and I was quite impressed by the Pie Chart it generated.
You can see I got what I asked for. A perfectly structured Pie Chart showing the recession percentages by Country, which makes it easier for me to analyze the data. The only complaint I had here was that the labels could be aligned a bit more properly as they are overlapping, but maybe this was expected as there were too many objects for the Country classifier. What impressed me more, is that GPT-4o even took to explain me the Pie Chart in points.
Overall, you have to first read your data well, understand if it’s big or small enough to be analyzed by just theoretical facts and figures. If not, start by reading your data and identify the primary classifier. A primary classifier is one that can help you classify your data the best in the form of subgroups and percentages. For our data this was the country column. Next ask GPT-4o to visualize that data in the form of a Pie Chart/ Bar Graph/ Line Graph based on that Primary Classifier. You will be good to go.
Data Manipulation:
Here’s another interesting thing that GPT-4o can do, it can completely manipulate and transform your entire data, into a new layout just as you wish. For example, you can break down a table into 2 separate tables, you can change the percentage composition of pie charts, and even merge smaller bar graphs to make them look bigger.
Here we performed an operation to extract just 3 columns from our huge dataset. This is the prompt that we gave.
“Reformat the dataset table into just 3 columns namely company, total_laid_off, and percentage_laid_off.”
This is how GPT-4o responded and you won’t believe how accurate it was in manipulating the previous dataset table to form a new table as per our requirements. The headings were present just as I wanted and even the data rows were present (only a few as the original dataset is huge).
You can upload your form of the dataset and ask GPT-4o to manipulate it in ways that you want.
Statistical Insights:
Lastly, we also performed a few statistical operations on our dataset. This is the prompt that we gave:
“Based on the layoff dataset, I need you to calculate the mean, median and mode of the dataset. Also, calculate the standard deviation if possible.”
Not surprised at GPT-4o’s response. Whatever statistical figure we asked for, we got it. I’m amazed at how fast and efficient GPT-4o was at providing the correct mean, median, mode, and even the standard deviation for such a huge and widely spread dataset.
Looks like GPT-4o is just getting started!
Engaging GPT-4o and Refining Your Responses
If you are not satisfied with any of GPT-4o’s initial responses for any of the data analysis test cases, then you have to engage GPT-4o more by refining your responses.
You can begin to refine your responses by clarifying or expanding your prompts. This is done by providing additional context or rephrasing the question. You can also reiterate your prompts by requesting more details or a different perspective on the data.
Lastly, you can also combine steps by breaking down complex analyses into simpler and more sequential steps.
Conclusion
GPT-4o is a powerful assistant for understanding, summarizing, and performing basic analyses on data described in natural language. While it enhances accessibility and ease of use, especially for initial exploration and summary tasks, it should complement rather than replace traditional data analysis tools for comprehensive and complex data analysis.