{"id":4431,"date":"2024-05-02T09:34:50","date_gmt":"2024-05-02T09:34:50","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=4431"},"modified":"2024-05-02T09:34:52","modified_gmt":"2024-05-02T09:34:52","slug":"data-analysis-claude-3","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/data-analysis-claude-3\/","title":{"rendered":"Expert Guide: Data Analysis with Claude 3 Made Simple"},"content":{"rendered":"\n<p>AI chatbots have created a great deal of impact with professionals across industries depending on them to do the grunt work. They have so many capabilities and can make our lives much easier if we know how to use them. Data analytics is one such sector where the right use of AI tools like <a href=\"https:\/\/favtutor.com\/articles\/claude-3-access\/\">Claude 3<\/a> can save users a great deal of time.<\/p>\n\n\n\n<p>Deriving actionable insights is a key part of data analytics. However, it is a tedious and repetitive process that requires multiple steps and the use of a wide variety of libraries like numpy, pandas, spacy and nltk (depending on the type of data of course).<\/p>\n\n\n\n<p>We just discussed <a href=\"https:\/\/favtutor.com\/articles\/claude-3-extension-google-sheets\/\">how to use Claude 3 for Google Sheets<\/a>, but what about complete Data Data Analysis? Let&#8217;s take a deep dive into that!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Using Claude for Tabular Data<\/strong><\/h2>\n\n\n\n<p>Let&#8217;s explore how Claude Opus can help us analyze data in tabular form. For this analysis, we chose a student data CSV file from <a href=\"https:\/\/www.kaggle.com\/datasets\/erqizhou\/students-data-analysis\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Kaggle<\/a>.<\/p>\n\n\n\n<p>The file contains 12 columns spanning student gender, race, grades in core subjects, final GPA, and background information.<\/p>\n\n\n\n<p><strong>Step 1: provide all the information to Claude along with the dataset.<\/strong><\/p>\n\n\n\n<p>Sample prompt: \u201c<em>Analyze the data provided as a data scientist, and give answers specific to this dataset only. [xyz] column of the data signifies student background and [xyz] column signifies race. Take these feature descriptions into account when analyzing the data.<\/em>\u201d<\/p>\n\n\n\n<p><strong>Step 2: find the number of categorical and numerical variables<\/strong><\/p>\n\n\n\n<p>Find the categorical variables and also the unique values of each categorical variable.<\/p>\n\n\n\n<p><strong>Prompt:<\/strong> Which of the following columns are categorical variables and which are numerical?<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"708\" height=\"1450\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis1.png\" alt=\"Claude 3 for &quot;Which of the following columns are categorical variables and which are numerical?&quot;\" class=\"wp-image-4432\"\/><\/figure>\n<\/div>\n\n\n<p><strong>Prompt:<\/strong> How many unique values are there in each categorical variable in the data set and what are the values?<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"722\" height=\"1484\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis2.png\" alt=\"Claude 3 for &quot;How many unique values are there in each categorical variable in the data set and what are the values?&quot;\" class=\"wp-image-4433\"\/><\/figure>\n<\/div>\n\n\n<p>Claude not only gives the list of categorical and numerical variables but also explains the type of data as it is interpreted. It can also provide the list of unique values and the datatype of the column.<\/p>\n\n\n\n<p><strong>Step 3: Data cleaning and finding outliers<\/strong><\/p>\n\n\n\n<p>The next step in analyzing data is data cleaning. The null values in the data must be counted and dropped, or replaced with mean\/median values.<\/p>\n\n\n\n<p><strong>Prompt:<\/strong> which columns have null values and what is the count of those values?<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"724\" height=\"706\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis3.png\" alt=\"Claude 3 for &quot;which columns have null values and what is the count of those values?&quot;\" class=\"wp-image-4434\"\/><\/figure>\n<\/div>\n\n\n<p>The sample dataset we used has no null values, but if user data does have null values, Claude can handle them by suggesting code to replace or drop the row with the empty cell.<\/p>\n\n\n\n<p><strong>Prompt: <\/strong>tell me the exact columns that have outliers in this dataset. Give specific answers<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"722\" height=\"1214\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis4.png\" alt=\"Claude 3 applying IQR\" class=\"wp-image-4435\"\/><\/figure>\n<\/div>\n\n\n<p>Claude gives a detailed explanation for all its calculations along with the code to verify it for yourself.<\/p>\n\n\n\n<p><strong>Step 4: Mapping relations between variables<\/strong><\/p>\n\n\n\n<p>Mapping relationships between different features is the best way to visualize data, gain actionable insights, and observe patterns and trends in the data.<\/p>\n\n\n\n<p><strong>Prompt:<\/strong> please display a bar chart of gender vs number of scorers in the top 5 % GPA<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"722\" height=\"721\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis5.png\" alt=\"Claude 3 to create a bar chat\" class=\"wp-image-4436\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis5.png 722w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis5-75x75.png 75w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis5-350x350.png 350w\" sizes=\"(max-width: 722px) 100vw, 722px\" \/><\/figure>\n<\/div>\n\n\n<p><strong>Prompt:<\/strong> generate a stacked chart showing the gender-wise GPA distribution<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"716\" height=\"1314\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis6.png\" alt=\"Claude 3 to generate a stacked chart showing the gender-wise GPA distribution\" class=\"wp-image-4437\"\/><\/figure>\n<\/div>\n\n\n<p>The AI assistant provided a detailed working code customized to the dataset and also provided a detailed explanation of all the steps used.<\/p>\n\n\n\n<p><strong>Prompt:<\/strong> what percentage of the accepted students belonged to which race?<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"724\" height=\"842\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis7.png\" alt=\"\" class=\"wp-image-4438\"\/><\/figure>\n<\/div>\n\n\n<p><strong>Step 5: Generating insights and trend patterns<\/strong><\/p>\n\n\n\n<p>Claude can observe and explain the patterns and correlations in the dataset. It can also directly compute the statistical distribution of the features.<\/p>\n\n\n\n<p><strong>Prompt:<\/strong> Derive meaningful insights specific to the dataset provided. Give numerical comparisons wherever necessary ( like in the case of gender and race)<br>\u2018<br><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"714\" height=\"2090\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis8.png\" alt=\"Claude for Generating insights and trend patterns\" class=\"wp-image-4439\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis8.png 714w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis8-525x1536.png 525w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis8-700x2048.png 700w\" sizes=\"(max-width: 714px) 100vw, 714px\" \/><\/figure>\n<\/div>\n\n\n<p>It provides such a detailed analysis of the dataset with a single prompt, there is barely any need to run codes!<\/p>\n\n\n\n<p><strong>Step 6: Data pre-processing for ML models<\/strong><\/p>\n\n\n\n<p>The final step in the analysis of this data is to prepare it for a predictive ML or DL model. To prepare the data, correlation matrices need to be found, the appropriate ML model needs to be decided and the independent and dependant variable lists must be formulated.<\/p>\n\n\n\n<p><strong>Prompt:<\/strong> List the columns on which the target column y depends<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"720\" height=\"1270\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis9.png\" alt=\"Claude to List the columns on which the target column y depends\" class=\"wp-image-4440\"\/><\/figure>\n<\/div>\n\n\n<p><strong>Prompt:<\/strong> looking at the data distribution, which regression or classification model would be most suitable to predict the target variable y<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"726\" height=\"613\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis10.png\" alt=\"\" class=\"wp-image-4441\"\/><\/figure>\n<\/div>\n\n\n<p><strong>Prompt:<\/strong> Implement random forest classification on the dataset with target variable y and give the accuracy of the model<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"708\" height=\"1468\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/ClaudeForDataAnalysis11.png\" alt=\"\" class=\"wp-image-4442\"\/><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>Claude has made data analysis so easy that any amateur with little to no knowledge of coding can leverage it for professional use. Experienced data scientists clearly do it better. it gives a starting point for new coders to begin their journey. That in itself is an incredible resource to have for daily use!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Find out how to use Claude 3 AI for various Data Analysis Tasks in this guide, along with prompts and output.<\/p>\n","protected":false},"author":20,"featured_media":4444,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[57],"tags":[56,191,147,90,157,230,231],"class_list":["post-4431","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-amazon","tag-anthropic","tag-claude","tag-claude-3","tag-data-analysis","tag-guide"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/4431","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/20"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=4431"}],"version-history":[{"count":2,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/4431\/revisions"}],"predecessor-version":[{"id":4445,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/4431\/revisions\/4445"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/4444"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=4431"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=4431"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=4431"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}