Data Poisoning Is Next Big Threat in Security
- By CDOTrends editors
- May 02, 2022
As artificial intelligence and machine learning become more prevalent, so does the threat of data poisoning. Data poisoning is manipulating the information used to train machines to subvert AI-powered defenses.
This is a severe problem because it can be tough to detect data poisoning attacks. Furthermore, many companies are not prepared to deal with this threat.
"The very nature of machine learning, a subset of AI, is the target of data poisoning. Given reams of data, computers can be trained to categorize information correctly," says Tim Culpan, columnist, Bloomberg, in a recent commentary.
AI and machine learning systems are only as good as the data they are trained on. If that data is manipulated in a certain way, it can cause the systems to malfunction. For example, if a malicious actor were to add a few false data points to a dataset used to train a machine learning system, that system might be unable to distinguish between a real and a fake input.
Researchers have already demonstrated that it is possible to poison machine learning systems with just a small amount of malicious data. In a presentation at the HITCon security conference in Taipei last year, researchers Cheng Shin-ming and Tseng Ming-huei showed that a backdoor code might completely bypass defenses by contaminating less than 0.7% of the data fed to the machine-learning system.
To protect against data poisoning, companies need to be vigilant about the quality of their data. They must also monitor for anomalous activity in their systems and ensure that their AI models are regularly tested against attacks. Additionally, it is essential to educate employees about the dangers of data poisoning and the importance of security.
Culpan says that even the most significant cybersecurity firms sometimes have trouble keeping their training datasets free from manipulation. This makes it all the more important for companies to be aware of these types of attacks and take steps to prevent them.
Fortunately, several strategies can help detect and prevent data poisoning attacks. These include improving the transparency and accuracy of datasets, incorporating multiple AI models and advanced analytics, monitoring neural networks for unfamiliar activity, and implementing machine learning-based anomaly detection systems.
"To stay safe, companies need to ensure their data is clean, but that means training their systems with fewer examples than they’d get with open source offerings. In machine learning, sample size matters," Culpan adds.