Autonomous driving paper index

A user-intent guided diffusion-segmentation collaborative framework for controllable digital media content generation

2026-07-01 · Scientific Reports

autonomous drivingcontrol

One-line summary

To address issues such as imprecise intent parsing, separation of generation and segmentation optimization, and insufficient robustness in current user intent-guided digital media content generation models, this study proposes the LLaDiSAM framework.

Engineering notes

Experiments on four datasets including LAION-5B and SA-1B show that LLaDiSAM achieves an IFA of 0.89–0.92, an FID as low as 6.67–7.21, and an mIoU of 0.88–0.91. The coefficient of variation (CV) in 10 test runs is all less than 5%, and the overall deviation rate in the “no reference + concise instruction” scenario is only 7.5%, which is significantly better than 12 baseline models such as SDXL and DALL·E 3.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为端到端自动驾驶、BEV感知、3D目标检测、轨迹预测、路径规划、LiDAR感知等高价值论文补充中文说明。

Original abstract

To address issues such as imprecise intent parsing, separation of generation and segmentation optimization, and insufficient robustness in current user intent-guided digital media content generation models, this study proposes the LLaDiSAM framework. It constructs an “understanding-generation-optimization” closed loop through the collaboration of three modules: LLaDA-V (intent parsing), DiT (diffusion generation), and FastSAM (segmentation optimization). Experiments on four datasets including LAION-5B and SA-1B show that LLaDiSAM achieves an IFA of 0.89–0.92, an FID as low as 6.67–7.21, and an mIoU of 0.88–0.91. The coefficient of variation (CV) in 10 test runs is all less than 5%, and the overall deviation rate in the “no reference + concise instruction” scenario is only 7.5%, which is significantly better than 12 baseline models such as SDXL and DALL·E 3. The existing limitations include weak sorting capability for complex multi-intents and room for improvement in inference efficiency. Future work will focus on multi-intent priority modeling and lightweight optimization of the segmentation module. This framework provides a new efficient and controllable paradigm for digital media creation, and promotes the practical application of intent-guided generation technology.

5.0Engineering value
7.0Research novelty
5.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Full Self Driving can prepare a custom autonomous driving literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment