Researchers from the University of California in San Diego have found that AI robots have become so sophisticated that they are now more indistinguishable from humans that they could pass the Turing Test.
Developed by mathematician and computer scientist Alan Turing, the test aims to determine if intelligent agents could mimic human intelligence.
The researchers tested four popular language models, GPT-4o, LLaMa-3, GPT-4.5, and Eliza, to determine which could pass the Turing test.
They enlisted 126 undergraduate students from UC San Diego and another 158 participants from Prolific, an online platform for acquiring study subjects.
AI robots are mostly indistinguishable from humans.
The participants engaged in a five-minute conversation with AI robots and humans to determine if they could distinguish between them. They performed two tests involving a “PERSONA” prompt to act more like a human and a NO PERSONA prompt, which instructed the robots on which persona to adopt.
Both the humans and the AI robots attempted to convince the participants that they were humans, leaving them to detect deception. Each participant carried the test eight times before exiting the study.
“People were no better than chance at distinguishing humans from GPT-4.5 and LLaMa (a multilingual language model released by Meta AI),” said one of the researchers, Cameron Jones.
When instructed to adopt a human persona, GPT-4.5 convinced the vast majority of the participants that it was indeed human, with a success rate of 73%.
Similarly, LLaMa-3.1 had a decent success rate of 56%, while ELIZA and GPT-4o convinced the participants that they were human 23% and 21% of the time, respectively.
“The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test,” they stated.
“The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test.”
Previously, GPT 4 was judged human 50% of the time in a preceding 2-party Turing test, suggesting that newer models could score even higher in mimicking human intelligence.
Meanwhile, the researchers warned that the results could impact the debate about the intelligence of large language models and their social and economic impact. People have always feared that artificial intelligence agents will replace human employees in various fields, rendering them jobless and unable to make a living.
However, they cast doubts about the artificial intelligence robots actually passing the Turing test since they needed a prompt explicitly telling them to mimic human behavior.
“Did LLMs really pass if they needed a prompt? It’s a good question. Without any prompt, LLMs would fail for trivial reasons.”
While AI robots have yet to achieve human intelligence, they can still make better decisions than humans in certain situations due to their ability to process vast amounts of data.
Cameron noted that “people no longer see ‘classical’ intelligence (e.g. math, knowledge, reasoning) as a good way of discriminating people from machines,” suggesting that AI robots were doing better in that regard.
Instead, they “focused more on linguistic and socioemotional factors in their strategies & reasons” to determine how artificial intelligence models compared to humans.
Leave a Reply