Abstract: We introduce the task of localizing a flexible number of objects in real-world 3D scenes using natural language descriptions. Existing 3D visual grounding tasks focus on localizing a unique ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results