Demonstration results of multi-modal instruction. The first row lists the visual stimulus, whereas the second row depicts our intermediate reconstructions. The manipulation results via the instruction ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results