Multi-agent learning for safe and efficient autonomous vehicles

Fei Miao, Pratt & Whitney Associate Professor at the University of Connecticut's School of Computing, delivered a talk titled “Learning and Control for Safety, Efficiency, and Resiliency of Embodied AI” on Nov. 8. Her presentation explored her team’s recent efforts to advance Multi-agent Reinforcement Learning (MARL) for Connected and Automated Vehicles (CAVs), which models multiple autonomous vehicles that can send and receive real-time information from nearby vehicles and infrastructure to enhance driving decisions.

Embodied AI is artificial intelligence that interacts with and learns from its environment and other agents. Multi-agent systems — like those involving multiple autonomous vehicles — are far more complex to model and control than single-agent systems.

“We are interested in multi-agent learning and control,” Miao explained. “There are many environments where robots need to coordinate with each other on specific tasks, like manufacturing, warehouses, drones and autonomous vehicles, in which they [the robots] need to interact with other robots in the environment safely and efficiently, and also finish the tasks.”

Ensuring safety is the first challenge in successfully modeling these critical applications and for bringing multi-agent systems into real-world contexts.

“It’s usually okay if we train a reinforcement learning algorithm in a game simulator and it fails. We just restarted the game. But if an autonomous vehicle fails in real life, then it will be dangerous for human life,” Miao noted.

Efficiency is another key challenge, as the system’s overall functionality depends on the collective performance of individual agents. Miao’s team has worked on creating policies so that each vehicle not only acts in its best interest but also maximizes the system's efficiency. Through shared information, CAVs can communicate to help manage real-world complexities like fluctuating traffic conditions.

One research focus of Miao’s team is uncertainty quantification (UQ) for computer vision in multi-agent systems. Existing methods often neglect the inherent uncertainty in computer vision models, which motivated Miao to propose a novel UQ method — called Double-M quantification — that integrates a learning-based model with statistics-based calibration to provide uncertainty quantification results for collaborative object detection models.

“There are many existing neural network architectures for object detection, which only predict the mean positions of the bounding box. In our method, we add additional predictions for the covariance matrix and several additional neural network layers. We used KL-divergence loss with reducing the covariance matrix volume as a requirement,” she said.

Miao elaborated on why the learning-based model alone is not sufficient, pointing out the model’s tendency for overconfidence.

Pure learning-based predictions can sometimes be misleading, as models may appear confident even when their predictions are inaccurate. To address this, Miao introduced a bootstrap-based calibration.

“We retrain the model on different subsets of the data iteratively and validate it on other subsets to better calibrate the prediction error as part of the covariance matrix; and then, during the inference stage, the final covariance will be both neural network outputs predicted covariance, plus the statistically basic calibrated part,” Miao explained.

This calibration stage indeed results in more accurate predictions, as demonstrated by improvement in object-bounding accuracy benchmarked against open-source datasets. The algorithm's specifics are detailed in the paper, which highlights a notable relationship between accuracy and uncertainty.

Another application of UQ that Miao highlighted is the improvement of 3D semantic occupancy prediction (OCC) from 2D camera images. This involves inferring scene geometry and semantics from limited observational data.

“Semantic predictions could be very inaccurate with only 2D camera images, and previous methods often neglect the inherent uncertainty in models. Our first step is to estimate depth information using the Depth-UP framework to enhance geometry completion for OCC models,” she said. “We then propagate the depth estimation as an additional feature for the downstream layers of the neural network to produce the final output.”

The hierarchical conformal prediction (HCP) method that Miao’s team proposed effectively addressed the class imbalances in OCC datasets, such as the challenge of detecting pedestrians, which is critical for safe autonomous driving.

“Data imbalance is a critical issue since in our training image. Human only makes up a very small part, so it’s very likely for us to miss out on humans in our semantic prediction, and our autonomous car would have no idea that there is a human walking on the street, which can be very dangerous. With our framework, we improved the occupancy prediction on those imbalance rare classes,” Miao explained.

A third application of UQ is trajectory prediction, which is to infer future trajectories and their associated uncertainties.

“Previous trajectory prediction models did not consider reducing the uncertainty as object, especially under potential distribution shift,” Miao elaborated on why addressing distribution shifts in time-series data is important. “Given time series data, like driving trajectories by humans, it’s very possible that at some moment, the distribution is not the same as what we have seen in the training data set.”

Her team proposed the Conformal Uncertainty Quantification under Distribution Shift (CUQDS) framework, which combines Gaussian regression with a conformal prediction framework to provide robust trajectory predictions. Miao showed the benchmarking result of their approach against other neural network models that demonstrate the effectiveness of UQ in enhancing prediction accuracy.

Turning to Multi-agent Reinforcement learning (MARL), Miao discussed how her team integrates principles of game theory and reinforcement learning. This approach is essential for designing policies that guide individual agents to make decisions that optimize the joint behavior of the entire system. In MARL, each agent’s decision impacts the collective outcome, creating a complex environment that requires distributed policy training and behavior prediction. The robustness of MARL is central to this research, particularly under conditions where state information may be perturbed, like real-life scenarios containing human-driven vehicles (HDVs) that are less predictable.

“We need to train decentralized policies that each agent knows what action to take considering other agents’ behaviors,” Miao noted, adding that MARL systems must prioritize safety, ensuring agents use safe actions even as they learn.

To address this challenge, Miao proposed a decision-making framework: Safe-RMM, which is composed of two levels. The high level learns the cooperative behavior of CAVs and generates discrete planning action for each vehicle, while the low level executes the plan using model predictive control (MPC) with robust Control Barrier Functions (CBFs). Safe-RMM allows agents to act conservatively when needed, enhancing safety without excessively compromising performance.

In her closing remarks, Miao discussed her team’s ongoing research in data-driven optimization and cyber-physical system security. Miao and her team’s work illustrates a growing field where advancements in AI control and learning techniques are beginning to bridge the gap between theoretical models and practical applications in dynamic and safety-critical domains like autonomous driving.

Assistant Professor in the Department of Computer Science at Hopkins Tianmin Shu, the host of this seminar, explained the implication of Miao’s work in an interview with The News-Letter.

"While this is a general multi-agent learning framework, it also comes with risk as current approaches cannot guarantee that the learned policies for the agents are safe to deploy in the real world,“ he said. “While much of the field is focusing on building larger models trained on more data, Professor Miao's work aims to address the challenge of building a safeguard for mulita-gent autonomous systems, such as self-driving cars. It is not the most eye-catching research, but it is critical for the success of real-world AI systems."

Multi-agent learning for safe and efficient autonomous vehicles

Trending

Hopkins signs amicus brief in support of Harvard's federal funding lawsuit

Class of 2025 shares mixed reactions to Commencement ceremony

SB2K17: Looking Back on The Spring Break Curse

One year of encampment: What Hopkins owes Palestine, democracy and humanity

TRU-UE demands for the protection of international workers and academic freedom at May Day Picket

University announces initiatives to boost intellectual diversity

Weekly Rundown

Events this weekend (April 25–27)

Science news in review: April 22

Hopkins Sports in Review (April 17 – April 21)

To watch and watch for: Week of April 21

News-Letter Magazine

Multi-agent learning for safe and efficient autonomous vehicles

Related Articles

Trending

Weekly Rundown

News-Letter Magazine