Researchers from MIT have developed a new AI approach that could soon find its way into self-driving cars and industrial robots in smart factories. Designed to handle unpredictable interactions safely, the deep-learning algorithm promises to enhance the robustness of AI systems in safety-critical scenarios.
From avoiding a pedestrian dashing across the road in unusually bad weather to overcoming the malicious obstruction of sensors in a manufacturing plant, the new system can enable AI systems to react in a robust manner even when critical inputs deviate due to either unreliable inputs or noise.
Our imperfect world
The details of this new approach are outlined in a study by Michael Everett, Björn Lütjens, and Jonathan How from MIT. Titled “Certifiable robustness to adversarial state uncertainty in deep reinforcement learning”, the study was published last month in IEEE’s Transactions on Neural Networks and Learning Systems.
The algorithm works by building a healthy “skepticism” of the measurements and inputs AI systems receive to help machines to navigate our real, imperfect world. This was done by leveraging reinforcement-learning with a deep neural network – both of which are typically used separately, into a new combed approach dubbed Certified Adversarial Robustness for Deep Reinforcement Learning (CARRL).
When implemented in a simulated collision-avoidance test and the video game Pong, CARRL performed better than standard ML techniques even when inputs became uncertain.
Specifically, a standard deep learning algorithm was able to win 20 times out of 21 times against the computer. This plunged to just 8 wins out of 28 games when an “adversary” shifts the ball’s position randomly by a few pixels to simulate manipulation of sensors or measurements. The CARRL agent, however, was able to do significantly better by winning 21 games out of 31 when subjected to the same adversary.
Working with uncertainty
“In order to use neural networks in safety-critical scenarios, we had to find out how to take real-time decisions based on worst-case assumptions on these possible realities,” said Lütjens, an MIT PhD student whose master’s thesis was credited as being the originator of this new study.
To be clear, there came a point when CARRL became “too conservative”, resulting in it completely avoiding its destination in the game. However, the researchers say that this is useful, and can be limited to tune the algorithm’s robustness. If implemented in robots, the algorithm offers the possibility of helping them handle unpredictable interactions.
“You often think of an adversary being someone who’s hacking your computer, but it could also just be that your sensors are not great, or your measurements aren’t perfect, which is often the case,” says Michael Everett, a postdoc at MIT’s Department of Aeronautics and Astronautics.
“Our approach helps to account for that imperfection and make a safe decision. In any safety-critical domain, this is an important approach to be thinking about,” he summed up.
The research paper can be accessed here.
Image credit: iStockphoto/anyaberkut