How AI-Driven Systems Can Be Hacked

CTO and Co-Founder at ERPScan. President of EAS-SEC. SAP cybersecurity evangelist. Speaker. Trainer. Geek.

Shutterstock

Nowadays, AI seems to be taking over everything, and there is a variety of examples of that. One of the areas it’s touched is cybersecurity, providing both attackers and defenders greater opportunities to reach their goals. Still, with great power comes great responsibility, as AI programs are not immune to attacks.

As an engineer who’s involved in the development of machine learning engines for user anomaly detection in ERP systems, one of my goals is to build a system that not only can detect attacks but can withstand them. The first cases of fooling machine learning algorithms were published a while ago, and the first example of a real-life scenario probably had something to do with spam filter bypasses.

The deep learning craze began in 2012 when new machine learning applications for image recognition such as AlexNet were created. They seemed so cool that people didn’t even think about security issues. Unfortunately, their ability to bypass them was the core architecture weakness, which was covered in 2013 by a group of researchers in their document “Intriguing Properties of Neural Networks.” These applications are vulnerable to adversarial examples — synthetically created inputs that pretend to relate to one class but actually are from another one. For complex objects, you simply can’t compose a formula that will separate apples from oranges. There will always be an adversarial example. What can be done by fooling the networks? Well, let me give you some suggestions:

• Fooling autonomous vehicles to misinterpret stop signs vs. speed limit.

• Bypassing facial recognition, such as the ones for ATM.

• Bypassing spam filters.

• Fooling sentiment analysis of movie reviews, hotels, etc.

• Bypassing anomaly detection engines.

• Faking voice commands.

• Misclassifying machine learning based-medical predictions.

Now, let’s move from theory to practice. One of the first examples was demonstrated on a popular database of handwritten digits. It showed that it was possible to make small changes in the initial picture of digit so that it would be recognized as another digit. The system not only misclassified one with a seven but even a one with a nine, and there are examples of all 100 possible misclassifications of digits. It was performed in a way that people couldn’t recognize a fake.

There’s other research that demonstrated that small perturbations of an image can lead to misclassification of it (e.g., instead of a panda, the system will recognize a car). Currently, there are over 10 different methods to attack neural networks, and I will focus on the one mentioned above.

In this scenario, an attacker would calculate a dependency matrix that shows changes of the output for every input. Then they would take the picture that they want to modify and change its most influential pixels and look if this can misclassify the results by the use of some optimized brute-forcing. The results were impressive, though there was a drawback. They were performed in white-box mode. It means that researchers used to attack a system with a known architecture, known datasets and known responses.

People thought that it was just a theory again, but it was not as simple in real life. We will never tell security researchers that their research is not dangerous, as next year they will surely show you an updated version. That’s what another team did.