Understand multimodal inputs (audio, vision, text) Interact through natural voice Observe and interpret user interfaces Plan and execute workflows autonomously Assist users through real-time AI agents ...