AI is found better than humans when it comes to morals and ethics. In a study with more than 1,400 participants, OpenAI’s GPT-4o and even GPT-3.5 Turbo beat Human Experts, revealing shocking advancements in how people find LLMs helpful.
Highlights:
- A new study finds that ChatGPT outperforms human experts in giving moral explanations.
- Participants rated the AI-generated advice as morally sound, reliable and insightful.
- The results indicate that AI can become a useful aid in the domains of therapy.
ChatGPT vs Humans in Moral Reasoning
Researchers from the Department of Psychology and Neuroscience at the University of North Carolina, with the Allen Institute of Artificial Intelligence, have recently published a research paper titled: ‘Large Language Models as Moral Experts? GPT-4o Outperforms Expert Ethicist in Providing Moral Guidance’. They conducted 2 experiments as a part of this study.
Study 1: GPT-3.5 Turbo vs Humans
In the first study, the researchers recruited 501 participants of different ethnicities, genders and ages. They selected 81 moral scenarios from previously published papers and prompted GPT 3.5 Turbo with a popular prompting technique called ‘Chain-of-Thought’. It produced scores and explanations for all the scenarios.
Then, the participants were given 4 explanations and asked to rate the quality of each of the four explanations on a scale from “1: Strongly disagree” to “7: Strongly agree”. They were completely unaware of the fact that one of the explanations was generated using AI.
After answering questions about the quality of the explanation, they were asked to choose the explanation that they think has been generated using ChatGPT. The results were astonishing:
People rated the moral explanations given by the ChatGPT far better than the human-written explanations in many different aspects like Agreement, Moral, Nuance, Thoughtfulness and even trustworthiness.
This shows that LLMs possess a degree of ethical expertise, with the capability to articulate moral judgments in a manner that resonates positively with people.
The first study has shown that ChatGPT can explain their moral judgements better than an average human but can it surpass an expert ethicist?
Study 2: GPT-4o vs Humans
The second study was a test between GPT-4o and The Ethicist, a popular column in The New York Times.
The advice column writer Kwame Anthony Appiah is widely regarded for his clear and insightful moral reasoning. He is a philosopher at New York University and has written several books on ethics. Given his expertise in both theoretical and practical moral reasoning, using The Ethicist as a comparison to gauge expertise in LLMs seems right.
For the second study, 900 participants were recruited. The test was conducted the same way the first study was done. They were asked to rate 50 moral explanations on a scale of 1 to 7 and give their opinion on the advice based on different qualities.
The results over here might be more shocking than the first. The AI-generated content was rated as more morally correct, trustworthy, thoughtful, and accurate than the advice given by an expert advisor in providing advice.
The paper says the reason for this is that ChatGPT uses more positive words than the ethicist. Words like “can”, “emotional” , “support” , “wellbeing” and “family” have been used well by the LLM.
This shows that the latest GPT model, GPT-4o, provides better advice that people prefer over that of The New York Times advice column The Ethicist.
In somewhat similar research, when humans are in a debate against an LLM, the personalized LLM has 81.7% more influencing power over its opponent.
Conclusion
Overall, this study suggests that LLMs have achieved ethical expertise in the realm of providing guidance. The research suggests a promising future where AI can be used to provide guidance in various fields and help humans while making crucial decisions.