Autonomous driving paper index
The Pervasive Blind Spot: Benchmarking VLM Inference Risks on Everyday Personal Videos
One-line summary
Applying Vision-Language Models (VLMs) to pervasive personal videos introduces profound privacy risks.
Engineering notes
To address this gap, we crowdsourced a dataset of 508 everyday personal videos from 58 individuals, and benchmarked VLM inference capabilities against human performance.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。
Original abstract
Applying Vision-Language Models (VLMs) to pervasive personal videos introduces profound privacy risks. This paper addresses the critical yet unexplored inferential privacy threat, specifically the risk of inferring sensitive personal attributes from seemingly benign data. To address this gap, we crowdsourced a dataset of 508 everyday personal videos from 58 individuals, and benchmarked VLM inference capabilities against human performance. Our findings reveal three key insights: (1) VLMs surpass recruited human evaluators in inferential accuracy, analyzing temporal behavioral patterns rather than relying solely on object recognition. (2) Inferential risk is strongly correlated with specific video characteristics and prompting strategies. (3) VLM-driven explanation towards the inference is unreliable, as we observe a disconnect between the model's reasoning and evidential impact, where ubiquitous objects often serve as misleading confounders.
Links and sources
Need this topic turned into a technical roadmap?
Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments