While there is plenty of classical robotics code in our planner, I wouldn't want people to assume that we don't use neural networks for planning.
Just because we don't deploy end-to-end models (e.g., sensors to controls), but have separate perception and planning components doesn't mean there isn't ML in each part. Having the components separate means we can train and update each individually, test them individually, inject overrides as needed, and so on. On the flip side, it's true that because it's not learned end-to-end today that there might exist a vastly simpler or higher quality system.
So we do a lot of research in this area, like EMMA (https://waymo.com/research/emma/) but don't assume that our planning isn't heavily ML based. A lot of our progress in the last couple of years has been driven by increasing the amount of ML used for planning, especially for behavior prediction (e.g., https://waymo.com/research/wayformer/)
Removed that "manually" world so now it describes exactly what you would have to do to train an end to end neural network.
NNs don't get information from nothing, you would have to subject them to the exact same obstacles, geometries and behaviors you coded on the manual version.
One good bet based on Waymo's decision to expand is that the amount of supervision each robotaxi needs keeps going down, so supervision is not tightly coupled to fleet size.
Big edge cases have little edge cases that require their own code / and those edge cases have smaller edge cases with yet more code.
My shorthand is "the real world is a fractal of edge cases".
It's fuzzy and plastic and complex, but the brain has functional areas, there is intelligence more local to specific sensors, pipelines where fusion happens, governors and supervisors, specific numeric limits to certain tasks, etc.
This is a bit akin to your "listing every possible item", in a way, in the sense that there are definitely finite structures tuned toward the application of being human.
This interplay via our supposed "AGI" and what is "cached" in our also not static but evolving hardware is really one of the most fascinating aspects of biology.
It seems most likely that this sort of boring domain randomization will be what works, or works well enough, for solving contact in this generation of robotics, but it would be much more exciting if someone figures out a better way to learn contact models (or a latent representation of them) in real time.
Figuring out physical interaction with the environment and traversal is truly one of the most stunning early achievements of life.
When grasping an object you need to know the normal force on the contact point of the object and check that you're still in the friction cone.
This is hard, because you need to know the friction coefficient of the object and finger tip combination, you need to know the exact coordinates on the object you're putting that finger plus the orientation and graspable surfaces of the object and you have an imperfect model of the robot dynamics that doesn't account for friction or the dynamics of the manipulated objects.
Basically nothing is easy. You don't know anything about what you're manipulating.
You don't. Certainly I do not need such information.
But that is what makes robotics hard, there is no easy answer as to how a human knows how to properly grasp an object.
More generally, continuous learning in real-time is something current models don't do well. Retraining an entire LLM every time something new is encountered is not scalable. Temporary learning does not easily transfer to long term knowledge. Continuous learning still seems in its infancy.
Imagine if the tip of your finger could just bend back. It would be way harder to know what you’re touching!
I don’t think so. I have hitchhikers thumb, so my thumbs do bend backwards. They don’t feel disadvantaged in terms of touching compared to my other fingers.
I think the answer to your question is that robots actually need to have complex friction models. If your planning has a need to actually know beforehand how to materials interact, you already lost.
Basically the solve rate was much lower without the use of a Bluetooth sensor, and they did a bunch of other things that made the result less impressive. Still a long way to go here.
Yet we do that stuff all the time, so not really a reasonable expectation given ML is based on biology. Still seems many general models do certain things better than we can though.
Not even close! At best it's a small subset of the internet + published books. The vast majority of human knowledge isn't even in the training sets yet.
I would question the use of a model fed everything, though.
Even if biomimicry turns out to be a useful strategy in designing general purpose robots, I would bet against humans being the right shape to mimic. And that's assuming general purpose robots will ever be more useful than robots designed or configured for specific tasks.
Handling heavy boxes? Baking a cake? Operating a circular saw? Assembling a PC? Performing surgery? Loading a ream of paper into a printer? Playing a violin? Opening a door? You can do it all with two five-fingered hands.
Are handled by non-humanoid Robots already.
>Baking a cake?
How do you think modern industrial baking works? All done by non-humanoid Roboters.
>Operating a circular saw?
Seriously? A circular saw is the perfect scenario where a humanoid is useless.
>Performing surgery?
Why are surgery Roboters not humanoid? Obviously it is easier to give robots custom tools than make human tools conform to robots.
>Loading a ream of paper into a printer?
Do you know how a modern printing press looks like?
>Playing a violin?
Google "mp3".
>Opening a door?
ARE YOU SERIOUS? Do you think automatic doors don't exist?
It is far, far easier, cheaper, fast to make robots do a specific task than to make them do all tasks. Humans are terrible at most tasks, making robots to conform to humans is absurd.
What do you actually want to do with these humanoids.
Maybe all this setup means that completing surgical tasks doesn't counter as dexterity.