AI promises to run critical aspects of our lives, from diagnosing disease to deciding when a plane needs repairing. Yet we constantly hear stories about AI making mistakes that would be absurd for humans to make, such as the Amazon AI program that designs phone covers with pictures of heroin needles and men in nappies.
How can we reconcile these two visions of AI — one a trusted advisor of the world’s most important decisions and the other a form of AI that seems totally confused by the world around it?
The point here is that AI is a set of tools that can be used well or badly. The difference between successful AI programs and ridiculous ones is that successful ones are built for the task at hand and trained on well-defined, well-curated data sets. If our AI is designed to spot signs of cancer in medical scans, we build it to analyze certain types of images, feed it many scans and tell it which ones have signs of cancer so it learns which are which. These data sets are curated by experts to ensure they fit certain parameters and are properly tagged to show what part of the image indicates signs of disease.
If, on the other hand, we cobble together an AI to design phone covers based on keywords or search terms and set it loose on some of the internet’s vast uncurated data sets without adequate training of what it should be looking for, it will quickly pick up some unexpected results. The same happened to Microsoft when it designed a chatbot designed to learn from conversations; unfortunately, it quickly picked up some nasty prejudices.
The Right Tools And Data
AI comprises of a set of tools, not a single solution. It encompasses machine learning, neural networks and image and language processing. To design an AI that works well, you need to understand the problem and select the right combination of tools.
Then you need to select your training data. The more you understand and curate the data, the better you can train your AI using it. We don’t know the full story of the phone cover designer, but it’s likely it was set loose on someone else’s data, with some broad guidance of what type of images to look for, and the first things it found reinforced its sense of what was right.
Most AIs that work well use well-defined data sets. The data from a jet engine or medical scans is vast and complex but fairly well understood by experts in the respective fields. Clear parameters can be set by said experts who know what they are looking for. That’s not to say we can’t do useful things when you aren’t controlling the data, but this needs very careful training and ongoing oversight to address the potential for bias to be introduced, and the insights are likely to offer broad guidance rather than proven facts. That said, setting an AI loose on data and seeing what happens can be very informative and help you refine it to do what you want, similar to chemical R&D. Researchers start with a broad range of possibilities and gradually hone in on the one that works. This should be done at the test phase by people who know how to interpret the results, not at the point it’s launched to the public.