I wish they'd asked him about multimodal input, because there are several scenes where screens are simultaneously accepting voice and touch input.
Not just being interpreted in isolation, either. The touch information added context to the spoken query, and the response to the question appears on screen in close relation with the area indicated by touch. It's an incredibly efficient exchange of information, and I've never really stopped thinking about it.
What's more, it couldn't have been some neat detail the designer came up with in post. It's all fully-scripted. I would argue such impressive, specific technical elements have to be part of the storyteller's vision.