Autonomous driving paper index
Explaining Multimodal Ai Predictions: A Conceptual Review
One-line summary
Predictive multimodal AI integrates diverse data types, including text, images, and audio, to produce a single predictive output.
Engineering notes
Key topics: autonomous driving, prediction. See the paper for implementation details and experimental results.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
Predictive multimodal AI integrates diverse data types, including text, images, and audio, to produce a single predictive output. Although multimodality improves performance, reasoning across modalities simultaneously increases model opacity, introducing challenges for explainable AI (XAI) beyond those of unimodal AI. Technically, multimodal explanations must capture how fused heterogeneous data streams influence predictions. Cognitively, multimodality requires selecting which modalities, relationships, and granularity to convey to support human understanding. Yet existing research lacks a framework for how the informational content of multimodal explanations shapes human reasoning. Addressing this gap, we conduct a conceptual review of predictive multimodal XAI as both a technical and a human reasoning challenge. We identify four clusters of multimodal XAI approaches, distinguished by what cross-modal information they convey and in what form and granularity. Drawing on cognitive psychology, we conjecture how each cluster affects human causal connection and explanation selection, guiding future human-centric multimodal XAI research and design.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments