Real-world visual input consists of rich scenes that are meaningfully composed of multiple objects that interact in complex but predictable ways. Despite this complexity, humans can recognize scenes ...