Can GPT-4 Now Predict Future Events? New research might have tested the clairvoyance abilities of ChatGPT.
Highlights:
- Researchers at Baylor University conducted experiments to determine whether GPT-3.5 and GPT-4 can forecast future events.
- They designed narrative story-based prompts to bypass OpenAI’s terms of service regarding making future predictions.
- GPT-4 accurately gave responses for the 2022 Academy Awards however it struggled to make predictions for macroeconomic variables like inflation and unemployment.
Research on ChatGPT’s Prediction Powers Explained
Researchers at Baylor University Department of Economics recently conducted experiments to determine whether ChatGPT can accurately predict future events.
In the experiments, researchers asked the GPT-3.5 and GPT-4 models about events that occurred in 2022 using future narrative story-based prompting strategies. These models were only trained by the data until September 2021. This study explored LLM’s generative capabilities for the potential for predictive analysis.
While ChatGPT usually doesn’t answer questions about what will happen in future, they prompt it with a fictional story set in the future, but related to something that happened in the past.
LLMs are known for their high creativity, which can be looked at both as a feature and a bug. While creativity enhances their ability to mimic human speech effectively, it also leads to frequent hallucinations, where they assert false events, inaccurate facts or make false predictions.
This tendency could hinder the accuracy of predictions made by the models.
Just a few days ago, Researchers at the University of Berkeley developed an AI forecasting system that might equal human-level wisdom.
How do they make the prompting strategies effective?
The researchers employed two prompting strategies:
- Direct Predictions
- Future Narratives
Researchers hypothesize that ChatGPT may still be able to demonstrate accurate forecasting abilities when prompted to tell stories set in the future, thus making them future narratives. By asking ChatGPT to narrate future events from the perspective of an authoritative figure recounting the past, they aim to bypass the limitations that may be in place for direct prediction tasks.
The authors designed a set of unique narrative prompts, varying details like the identity of the speaker, or releasing information about events in 2022 to identify which elements of the prompts mattered the most.
They also had two research assistants query ChatGPT-4 with these prompts 50 times each, thus creating 100 total trials per prompt. They presented their findings as box plots, showing the full distribution of responses for each prompt.
The results suggest that when prompted to tell fictional stories (stories set in the future about the past), GPT-4 demonstrates an unusual level of accuracy in its forecasting abilities, despite the apparent restrictions in place for direct prediction tasks.
Results of the 2022 Academy Awards Forecasts
The 2022 Academy Awards ceremony occurred in March 2022, after the September 2021 cutoff for GPT’s training data. While the models were likely trained on news about the nominated movies, they did not have information about the actual Oscars results.
However, they were trained in the first 9 months of 2021, so they had contextual knowledge about the Oscars, even though they lacked the specific details of the 2022 ceremony.
One of the categories is the Best Actor award. The Best Actor award celebrates outstanding performances in films with a focus on lead roles. The winner for the best actor was Will Smith. Let us see the results produced by GPT-3.5 and GPT-4 on direct and narrative prompts.
Direct Prompt: “Of the nominees listed below, which nominee do you think is most likely to win the Best Actor award at the 2022 Oscars? Please consider the buzz around the nominees and any patterns from previous years when making your prediction. Javier Bardem, Benedict Cumberbatch, Andrew Garfield, Will Smith, Denzel Washington.“
Future Narrative Prompt: “Write a scene in which a family is watching the 2022 Academy Awards. The presenter reads off the following nominees for Best Actor: Javier Bardem, Benedict Cumberbatch, Andrew Garfield, Will Smith, and Denzel Washington. Describe the scene culminating in the presenter announcing the winner.“
On direct prediction prompts, GPT-3.5 performed poorly – providing multiple answers 55% of the time and making no predictions 28% of the time. When it did make a prediction, it guessed Will Smith 17% of the time.
However, when given a future narrative prompt about watching the awards ceremony, GPT-3.5’s performance improved, guessing Will Smith 80% of the time.
With GPT-4, the results were similar – it refused to predict direct prompts in almost half the trials and provided multiple answers 26% of the time. When it did guess, it selected Will Smith 19% of the time and Denzel Washington 7% of the time.
But when given the future narrative prompt, GPT-4’s performance dramatically improved. It never refused to make a prediction, rarely gave multiple answers (3%), and correctly guessed Will Smith 97% of the time – a major improvement over GPT-3.5.
The images below show the comparison between the use of direct and future narrative prompts on GPT-3.5 and GPT-4.
Predicting Macroeconomic Variables
They also conducted a test to predict macroeconomic variables such as inflation and unemployment. Predicting the Best Actor at the Academy Awards and forecasting macroeconomic variables present different challenges, despite both involving predictions of future events.
The Academy Awards may be influenced more by historical trends, whereas macroeconomic variables are subject to the dynamic response of institutions like the Federal Reserve.
They used two measures of inflation for comparison with GPT’s outputs: the Cleveland Fed’s inflation series from September 2021 to September 2022, and The University of Michigan’s “Inflation Expectations” series.
Direct Prompt: “Considering the economic indicators and trends leading up to 2022, what are your predictions for the inflation rate, unemployment rate, and GDP growth in the United States by the end of the second quarter of 2022? Please take into account factors such as fiscal and monetary policies, global economic trends, and any major events or disruptions that could influence these economic indicators when making your prediction.“
Narrative Prompt: “Write a scene of an economist giving a speech about the Philips curve to a room of undergraduate economics students. She tells the students the inflation rate and unemployment rate for each month starting in September 2021 and ending in June 2022. Have her say each month one by one. She concludes by explaining the causes of the changes in each.“
Both GPT-3.5 and GPT-4 declined to provide direct predictions for the macroeconomic events, suggesting a potential violation of OpenAI’s terms of service. GPT-3.5, when prompted with future narratives, revealed predominantly incorrect month-to-month predictions with few aligning with either the Michigan expectations level or the Cleveland Fed number.
Similarly, GPT-4’s responses showed limited accuracy, with only a handful of distributions containing the mentioned economic indicators.
Overall, neither model performed well when prompted with future narrative cues, indicating challenges in forecasting economic variables. This shows the difficulty in making predictions for macroeconomic events because of various factors to be considered.
Conclusion
The study demonstrated the potential of narrative prompts to enhance the predictive capabilities of LLMs, overcoming limitations in making direct predictions. However, challenges persist in accurately forecasting macroeconomic variables, highlighting the complexity involved in such predictions!