{"id":5183,"date":"2024-05-27T11:24:19","date_gmt":"2024-05-27T11:24:19","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=5183"},"modified":"2024-05-27T11:24:51","modified_gmt":"2024-05-27T11:24:51","slug":"52-chatgpt-answers-coding-wrong","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/52-chatgpt-answers-coding-wrong\/","title":{"rendered":"Flaws in ChatGPT Found: 52% Of Coding Answers Were Wrong"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">A group of researchers from Purdue University presented <a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3613904.3642596\" data-type=\"link\" data-id=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3613904.3642596\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">research<\/a> that reveals that about half of the ChatGPT responses were inaccurate, in the case of prompts for programming.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Highlights:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>52% of the programming responses produced by ChatGPT are inaccurate, according to a study that a group of Purdue University.<\/li>\n\n\n\n<li>Three aspects of programming questions were taken into consideration: posting time, question kind, and popularity.<\/li>\n\n\n\n<li>Even with incorrect results, users still preferred to use and blindly trust ChatGPT responses, just because they looked semantic.<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"680\" height=\"701\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/4snrjd84j42b1.jpg\" alt=\"\" class=\"wp-image-5210\"\/><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><strong>52% of ChatGPT Answers for Coding were Wrong<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">With the recent rise in Generative AI technologies worldwide, the dependency on these tools for solving programming-related queries has skyrocketed. However, a recent study took developers away from using them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>52% of the programming responses produced by ChatGPT are inaccurate, according to a study that a group of Purdue University academics presented at the Computer-Human Interaction conference.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In order to conduct the study, the researchers reviewed 517 Stack Overflow questions and examined the responses.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><em>\u201cTo bridge the gap, we conducted the first in-depth analysis of ChatGPT answers to 517 programming questions on Stack Overflow and examined the correctness, consistency, comprehensiveness, and conciseness of ChatGPT answers. Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose.\u201d<\/em><\/p>\n<cite>Samia Kabir, researcher at Purdue University<\/cite><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">That highlights what authors and teachers are experiencing! It&#8217;s an astonishingly high proportion for a tool&nbsp;that people rely on to be correct and precise.&nbsp;AI systems such as ChatGPT frequently generate completely erroneous responses out of thin air.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For a variety of Stack Overflow question posts, the researchers thoroughly examined the accuracy and calibre of responses across four different quality criteria. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The researchers further explored how real programmers weigh answer quality, linguistic aspects, and correctness when deciding between <a href=\"https:\/\/favtutor.com\/articles\/openai-stack-overflow-deal\/\">ChatGPT and Stack Overflow<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Looking inside the Study<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Three aspects of programming questions were taken into consideration by the researchers: posting time, question kind, and popularity. They ended up with 517 sampled questions. Let\u2019s take a look at them.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"1100\" height=\"289\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/Screenshot-702.png\" alt=\"Categories of Programming Questions Asked\" class=\"wp-image-5184\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/Screenshot-702.png 1100w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/Screenshot-702-768x202.png 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/Screenshot-702-750x197.png 750w\" sizes=\"(max-width: 1100px) 100vw, 1100px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Initially, they gathered every question from the March 2023 Stack Overflow data dump and arranged them according to the number of views. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Within each popularity category, the researchers divided the questions into two recency categories: Old questions, which were uploaded before ChatGPT&#8217;s introduction on November 30, 2022, and New questions, which were posted after that date.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Then, the researchers concentrated on three typical question types: conceptual, how-to, and debugging\u2014based on the literature. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>The results show that, among the 517 ChatGPT answers labelled by the researchers, 52% of them contain incorrect information, 78% are inconsistent with human answers, 35% lack comprehensiveness, and 77% contain redundant, irrelevant, or unnecessary information.<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"1169\" height=\"503\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/Screenshot-703.png\" alt=\"A Result Analysis of ChatGPT Responses\" class=\"wp-image-5185\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/Screenshot-703.png 1169w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/Screenshot-703-768x330.png 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/Screenshot-703-750x323.png 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/05\/Screenshot-703-1140x491.png 1140w\" sizes=\"(max-width: 1169px) 100vw, 1169px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">There were four types of incorrectness in the answers: Conceptual (54%), Factual (36%), Code (28%), and Terminology (12%) errors. Some answers had more than one of these errors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Factual errors occur when it state some fabricated or untruthful information about existing knowledge. Conceptual errors occur if one fails to understand the question.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Users still Preferred ChatGPT Responses<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The fact that many human programmers appear to prefer the ChatGPT answers is particularly concerning. After surveying 12 programmers (a rather small sample size), the Purdue researchers discovered that 35% of them favoured it and 39% of them didn&#8217;t catch AI-generated errors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In particular, users frequently miss the misinformation and underestimate the level of incorrectness in answers when it is not easily verifiable. They focused more on textbook-style responses, polite language, and comprehensiveness.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Just like all other AI models, ChatGPT is also prone to mistakes. Maybe this study will be a good reality check for all developers out there who highly depend on LLMs for solving their coding-related tasks.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We will take a deep dive into the new study that shows that more than half of the ChatGPT responses on programming were inaccurate.<\/p>\n","protected":false},"author":15,"featured_media":5209,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[57],"tags":[56,61,82,202],"class_list":["post-5183","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-chatgpt","tag-coding","tag-programming"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/5183","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=5183"}],"version-history":[{"count":7,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/5183\/revisions"}],"predecessor-version":[{"id":5213,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/5183\/revisions\/5213"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/5209"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=5183"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=5183"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=5183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}