Abstract: Incorporating multimodal features and heterogeneous common sense knowledge in scene representation and visual reasoning techniques is essential for accurate and intuitive Visual Question ...
Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu 610031, P. R. China ...
Abstract: Vision language models (VLMs) demonstrate impressive achievement across various tasks, while perform poorly on visual graph. Existing benchmarks evaluate VLMs’ performance by coupling graph ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results