Is Turing test still serving as criterion of machine intelligence?

Question

During the first half of the last century Alan Turing proposed his 'Turing test' as means by which to answer whether machines have intelligence. To recall: the test amounts to a conversation between human agent A and two other agents - B and C, one of which is a computer the other a human. As part of the test, agent A converses with B and C without seeing neither, and have to determine for each instance of the conversation whether the interlocutor is a human or a machine. With all the developments in machine learning and other areas in artificial intelligence, is Turing Test still relevant to philosophical discussions or are there alternative means more suited to contemporary technology?

oddball8 · Answer

Before discussing this further, I’d like to highlight this quote from the Stanford Encyclopedia of Philosophy as I feel it is particularly relevant to your question:

”First, there is the question whether it is a useful goal for AI research to aim to make a machine that can pass the given test (administered over the specified length of time, at the specified degree of success). Second, there is the question of the appropriate conclusion to draw about the mental capacities of a machine that does manage to pass the test (administered over the specified length of time, at the specified degree of success).” – SEP, The Turing Test

These are the two main considerations when looking at the Turing Test.
Regarding the second consideration, it would not be possible to include in this answer a full literature review of this debate. This is one of the most significant disagreements in the history of philosophy of mind. If you’d like an overview of what has been said on this point, I’d recommend reading through the Chinese Room Argument page of SEP. It includes a comprehensive summary of Searle’s initial argument against the Turing Test, as well as a detailed look at the replies he received from philosophers at the time and the key texts in the debate since. There is also significant insight to be had from reading through the following question posted on this site a while ago: Is the Turing test a legitimate test to compare Robots to Human?
Instead, I will be focusing on the first consideration mentioned in the quote above. I would like to highlight the alternative tests that I am aware of which might be more useful goals for AI research.
Has a machine passed the Turing Test?
My understanding is that the Turing Test is relevant in that it is still regularly incited as a test of our progress in the field of Natural Language Processing (NLP), as the criteria of the Turing Test broadly reflect the main goals of NLP. Many people even credit the Turing Test with launching the field entirely. To convincingly pass a Turing Test, the AI would need to have natural language understanding and natural language generation accomplished to a near human level.
There is still, to this day, much disagreement over whether the Turing Test has actually been passed. Since Alan Turing devised the test 70 years ago, there have been many attempts, and some claims of success. As the Turing Test is not "official", it would be difficult to say that an AI had unequivocally passed it, and there is much debate as to what the threshold and criteria for passing ought to be. As you’ll be aware, Turing’s work casts a big shadow, hence continued interest in creating AI that can pass his test.
The most notable claim to have passed the Turing Test came in 2014, where Reading University developed and ran a test on an AI called Eugene. Reading University's press release contains the details of the tests that Eugene undertook. Experts have pointed out several issues with the test, including that Eugene only passes 30% of the time. Additionally, as Eugene is supposed to mimic the speech of a 13 year old for whom English is a second language, the creators have an easy justification for his childish and stilted speech patterns. This sleight of hand, according to many critics, combined with the dodging of questions through obfuscation, renders the test failed. You can read Ray Kurzweil’s (author of The Singularity is Near) qualms with the claim that Eugene passes here. The debate surrounding Eugene’s claimed pass illustrates some of the major issues with the Turing Test that later tests attempt to rectify.
The fact that we do not appear to have created AI that passes the Turing Test without controversy suggests that the Turing Test is still relevant. We have not advanced beyond the Turing Test; we are still working towards it.
Modified Turing Tests
In the past 70 years, many modified Turing Tests have been proposed, all originating from the same premise - that we are looking for AI that can converse as a person would. These tests owe their existence to Turing's work, so, whilst we could argue that some of these tests achieve Turing’s aims better, we’re still working within his framework. These tests are derivative of Turing’s original test as they all work under similar assumptions and primarily apply to the domain of NLP. They were devised to offer more sophisticated, specified, and rigorous criteria that could better ascertain progress:

Winograd Schema Challenge: proposed in response to the issues identified with Eugene’s test. The machine is given 2 sentences that differ by a word or two, but where the meaning would differ a great deal. The machine needs to choose the appropriate sentence for the situation.
The Lovelace Test: judges machine intelligence based on a program’s ability to create original content.
Reverse Turing Test: if the original Turing Test involved a person trying to determine if they were talking to a computer, the reverse involves a computer trying to determine if the person on the other side is a human or a machine. An everyday example of this are CAPTCHA tests.
Minimum intelligent signal test: the machine would need to be capable of NLP in order to process the propositions and would need to have a substantial knowledge base of facts, numbers, and concepts. The computer can only answer Yes/No or True/False to prevent the obfuscation we saw from Eugene.
The Marcus Test: a machine should be able to watch a TV Programme and answer questions about it. Gary Marcus explains his proposition in the New Yorker.

Further alternatives can be read about here.
Computational Complexity as a measure of Intelligence
The field of algorithmic information theory purports to offer an alternative means of measuring the intelligence levels of AI. Computational complexity measures the difficulty and resource usage needed to solve a problem. AI tackling increasingly computationally complex problems would be an indicator of improved AI intelligence.
An intelligence test using algorithmic information theory was put forward in the paper “Measuring universal intelligence: Towards an anytime intelligence test” Hernandez-Orallo and Dowe (2010), available as a PDF here. The proposed test would work for testing the intelligence of human beings as well as AI, so we could compare the relative intelligence of AI.
These methods are a measure of progress rather than an aim in themselves. Measuring computational complexity cannot tell us how useful the AI developed is – it’s up to developers to choose to put the increasing complexity to good use. The Turing Test was created to test for a specific goal and function – human like AI communication, whereas computational complexity is mainly showing us how complicated our systems are getting.
Arguably this approach has much more scientific rigour. The Turing Test judges AI using human perception, which is varied and often unpredictable. Human perception is relevant to NLP because one of the goals itself is to accurately communicate with people, but human perception is less relevant and useful in other areas of AI. Computational complexity measures are relevant for all types of domain specific AI to be judged, which better fits the varied work being done using AI now. We have moved away from trying to create anthropomorphic robots towards using AI to perform specialised tasks and automation.
Evaluating AI this way also distinguishes degrees of complexity. The Turing Test is pass/fail, but computational complexity is on a spectrum, so the shades of difference between different AIs will become apparent.
Some resources you can access to familiarise yourself with algorithmic information theory and similar concepts:

Wikipedia: Kolmogorov Complexity Hernandez-Orallo and Dowe (2010) use this measure as the basis of their test.
Towards Data Science: Algorithmic Complexity 101
Scholarpedia: Algorithmic Information Theory
MC.AI: How to compare machine learning algorithms
OpenAI: AI and Efficiency

Finally, it’s worth noting that whether complexity measures are a proxy for real intelligence raises all the same questions of consciousness and intentionality that apply to the Turing Test.

Is Turing test still serving as criterion of machine intelligence?

One Answer

Add your own answers!

Ask a Question