Robot Dexterity Still Seems Hard(www.construction-physics.com)

71 pointsby mhb3 days ago8 comments

sashank_15093 days ago
Recently I had a chance to listen to a set of talks powering Waymo Technology. I think the average academic roboticist will be shocked by the complete lack of end to end deep learning models or even large models powering Waymo. It’s interesting to me that the only working self driving car on the market right now, basically has painstakingly listed every possible road obstacle, has coded every possible driving logic to it, and manually addressed every edge case. maybe Tesla’s end to end approach will work, and that will be the way moving forward, but the real world seems to provide an almost limitless amount of edge cases that neural networks don’t seem great at handling. In fact the winning approach to humanoids, if Waymo is proven to be the right approach might be listing every possible item a humanoid can see an environment, detecting them and then planning for them.
- boulos3 days ago
  (Disclosure: I work for Waymo)
  While there is plenty of classical robotics code in our planner, I wouldn't want people to assume that we don't use neural networks for planning.
  Just because we don't deploy end-to-end models (e.g., sensors to controls), but have separate perception and planning components doesn't mean there isn't ML in each part. Having the components separate means we can train and update each individually, test them individually, inject overrides as needed, and so on. On the flip side, it's true that because it's not learned end-to-end today that there might exist a vastly simpler or higher quality system.
  So we do a lot of research in this area, like EMMA (https://waymo.com/research/emma/) but don't assume that our planning isn't heavily ML based. A lot of our progress in the last couple of years has been driven by increasing the amount of ML used for planning, especially for behavior prediction (e.g., https://waymo.com/research/wayformer/)
- marcosdumay3 days ago
  > basically has painstakingly listed every possible road obstacle, has coded every possible driving logic to it, and ... addressed every edge case
  Removed that "manually" world so now it describes exactly what you would have to do to train an end to end neural network.
  NNs don't get information from nothing, you would have to subject them to the exact same obstacles, geometries and behaviors you coded on the manual version.
- Zigurd3 days ago
  Not every edge case, but enough that the vehicle can correctly determine it doesn't know how to proceed and must ask a human to choose from among a menu of choices. This is how Waymo described how supervision works. Nobody actually drives the vehicle remotely. They just make a decision the on-board intelligence has decided it can't make.
  One good bet based on Waymo's decision to expand is that the amount of supervision each robotaxi needs keeps going down, so supervision is not tightly coupled to fleet size.
  - bethekidyouwant2 days ago
    A menu of choices operated with the operators WASD keys
- tuatoru2 days ago
  https://en.wikipedia.org/wiki/Siphonaptera_(poem)
  Big edge cases have little edge cases that require their own code / and those edge cases have smaller edge cases with yet more code.
  My shorthand is "the real world is a fractal of edge cases".
- huevosabio3 days ago
  I think it would be the other way around, academic roboticists are very well aware of how damn hard the physical world is.
- AIPedant2 days ago
  IMO the most relevant point is that, even with all that data, Waymos are backed up by a large team of humans that can help guide the cars through "difficult" common-sense situations that AI is simply not capable of handling, both because artificial neural networks are very primitive and stupid compared to vertebrate brains, and because it is practically impossible to collect enough data for the dumb ANNs to learn from. Self-driving companies are very cagey about this.
- sho_hn3 days ago
  I suppose the (crummy) analog is that a human's "models" are equally not entirely general; we have evolved a particular architecture that is baked into our hardware and perpetuated via our DNA.
  It's fuzzy and plastic and complex, but the brain has functional areas, there is intelligence more local to specific sensors, pipelines where fusion happens, governors and supervisors, specific numeric limits to certain tasks, etc.
  This is a bit akin to your "listing every possible item", in a way, in the sense that there are definitely finite structures tuned toward the application of being human.
  This interplay via our supposed "AGI" and what is "cached" in our also not static but evolving hardware is really one of the most fascinating aspects of biology.
  - exe342 days ago
    "Any sufficiently big bag of tricks is indistinguishable from true intelligence."
- egbantan2 days ago
  Link to talks?
levocardia3 days ago
Surprised that there isn't any explicit discussion of why dexterity is so hard, beyond sensory perception. One of the root causes (IMHO the biggest one) is that modeling contact, ie the static and dynamic friction between two objects, is extremely complicated. There are various modeling strategies but their results are highly sensitive to various tuning parameters which makes it very hard to learn in simulation. From what I remember, the OpenAI Rubik's Cube solver basically learned across a giant set of worlds of many different possible tuning parameters for the contact models and was able to generalize okay to the real world, in various situations.
It seems most likely that this sort of boring domain randomization will be what works, or works well enough, for solving contact in this generation of robotics, but it would be much more exciting if someone figures out a better way to learn contact models (or a latent representation of them) in real time.
- sho_hn3 days ago
  This rings super plausible to me. I dabbled a bit in hobby electronics making DIY walkers, and the more time you spend on junior stuff like that (trying to model a good response to servo load feedback that works in every situation, etc.) the more it dawns on you that what humans and other animals do with the sensor feedback they get from their limbs is so rich in "magic" and intelligence.
  Figuring out physical interaction with the environment and traversal is truly one of the most stunning early achievements of life.
- beau_g3 days ago
  On a freestanding humanoid robot, you have an inverse kinematic chain running all the way from the touch point to the ground, with many actuators in between, each of which to some degree squares the complexity of the problem. The parent article mentions a Fanuc or Kuka bot, which lets say is 6 axis - they are incredibly stiff/strong, in many cases many orders of magnitude stronger than they really need to be for the job they are tasked with, they do not move, modeling things like clashing with the environment/itself is much simpler because they are placed in 100% controlled environments - remove all of those qualifiers (weak robot because it needs to be light, dynamic environment, and count the DOF between the robots finger and it's ankles) and it gives a clearer picture than the article offers of why all this stuff is difficult. Can't take much of a divide and conquer approach like you can in other domains.
  - imtringued2 days ago
    Inverse kinematics are piss easy.
    When grasping an object you need to know the normal force on the contact point of the object and check that you're still in the friction cone.
    This is hard, because you need to know the friction coefficient of the object and finger tip combination, you need to know the exact coordinates on the object you're putting that finger plus the orientation and graspable surfaces of the object and you have an imperfect model of the robot dynamics that doesn't account for friction or the dynamics of the manipulated objects.
    Basically nothing is easy. You don't know anything about what you're manipulating.
    constantcryinga day ago
    >When grasping an object you need to know the normal force on the contact point of the object and check that you're still in the friction cone.
    You don't. Certainly I do not need such information.
    But that is what makes robotics hard, there is no easy answer as to how a human knows how to properly grasp an object.
- rapjr93 days ago
  Dexterity is also hard because, at least in humans, it relies on knowing something of the nature of an object _before_ manipulating it. Is it light or heavy? Soft or rigid? Is it a bag of popcorn, popcorn kernels, a bag of powder, or a pillow? How tightly is it packed in the bag? Fabric or cardboard? Attached to other objects or not? Is the USB plug the right type and oriented correctly? (Even humans have trouble with this one.) Does it have a slippery surface or a grippy surface? To be immediately successful in manipulation, pre-knowledge based on sensing and identification is usually required. Possibly it would be ok if a robot took several tries to figure this out based on some general principles, but it will seem clumsy and be slower. It seems there is an ontology problem here, which requires understanding a lot about the world in order to be able to successfully manipulate it.
  More generally, continuous learning in real-time is something current models don't do well. Retraining an entire LLM every time something new is encountered is not scalable. Temporary learning does not easily transfer to long term knowledge. Continuous learning still seems in its infancy.
  - pixl973 days ago
    Also when we don't know the properties of an object we are about to manipulate we'll approach it cautiously and learn it before we apply too much force. This tends to happen transparently and quickly for adults, but for infants you can watch it play out more slowly.
    marcosdumay3 days ago
    My guess is that it helps a lot that we have flexible cushioned fingertips that are highly sensitive to pressure. That's a hardware feature that robots mostly lack.
    kulahan3 days ago
    Our evolution of nails, as opposed to the much more common claw, is another part of the symphony of touch. It provides a hard backstop to assist with touch sensitivity.
    Imagine if the tip of your finger could just bend back. It would be way harder to know what you’re touching!
    dleary2 days ago
    > Imagine if the tip of your finger could just bend back. It would be way harder to know what you’re touching!
    I don’t think so. I have hitchhikers thumb, so my thumbs do bend backwards. They don’t feel disadvantaged in terms of touching compared to my other fingers.
    kulahan2 days ago
    I’m talking about just the tip, the part you use primarily for touch, not the whole finger segment. You’d need to be missing a nail - and if you ever are missing one, you’ll be surprised how much of your sense you lose!
- constantcryinga day ago
  But humans can't really model friction either. Humans use their perception of the interaction to optimize for the desired behavior, for that it is enough to know that more pressure means higher friction. Even subconsciously there is no model of this, it is fine tuned on the fly and in response to external circumstances.
  I think the answer to your question is that robots actually need to have complex friction models. If your planning has a need to actually know beforehand how to materials interact, you already lost.
- hahaxdxd1233 days ago
  Here's an interesting blog post on the limitations of domain randomization for OpenAI's results: https://www.alexirpan.com/2019/10/29/openai-rubiks.html
  Basically the solve rate was much lower without the use of a Bluetooth sensor, and they did a bunch of other things that made the result less impressive. Still a long way to go here.
- fennecbutt2 days ago
  And perhaps it's because people expect robotics powered by ML to be perfect, never fall over, never crush the egg with a gripper.
  Yet we do that stuff all the time, so not really a reasonable expectation given ML is based on biology. Still seems many general models do certain things better than we can though.
  - thfuran2 days ago
    Based on is a bit strong. Parts are at least notionally inspired by biology.
iandanforth3 days ago
For some perspective we have not yet scaled robot training. The amount of data that Pi is using to train their impressively capable robots is in the range of thousands of hours of data. In contrast language models are trained over trillions of tokens comprising the entirety of human knowledge. So if you're saying things like "this still seems hard" just remember we have yet to hit this with the data hammer. Simulation is proving a great way to augment / bootstrap robot dexterity but it still pales in comparison to data in the real world. So, as the author points out, we may get capability scaling like Waymo where one company painstakingly collects real data over a decade, but we may also see the rapid progress in simulators and simulator speed overtake for practical household / industrial tasks. My bet is on the latter.
- deeThrow943 days ago
  > In contrast language models are trained over trillions of tokens comprising the entirety of human knowledge.
  Not even close! At best it's a small subset of the internet + published books. The vast majority of human knowledge isn't even in the training sets yet.
  I would question the use of a model fed everything, though.
- hahaxdxd1233 days ago
  Correct me if I'm wrong, but I haven't seen any simulator progress in years (e.g. MuJoCo hasn't changed in 5 years but is still SOTA accuracy)
  - erwincoumans3 days ago
    MuJoCo, Drake, Pinocchio (and other simulators) are still improving (adding more accurate collision detection, better solvers etc).
    frainfreeze3 days ago
    How do they compare to PyBullet ?
MisterTea3 days ago
You need feedback. I started with industrial robotics in the 90s and then having done a bunch of CNC and motion control: positioning is easy. The big problem to solve is enabling the robot to feel what it's doing and understand how it relates to the coordinate space. That's why we're dexterous, we can close our eyes and feel our hands in 3D space instead of just knowing a position in some coordinate system. We can put on a pair of gloves without looking by feel alone. I picture a robot arm similar to when you arm goes numb from sleeping on it. You can see it but it's dead. That's how a robot feels.
Zigurd3 days ago
There are half a dozen successful commercially available surgical robot products out there. None try to mimic a surgeon's hands.
Even if biomimicry turns out to be a useful strategy in designing general purpose robots, I would bet against humans being the right shape to mimic. And that's assuming general purpose robots will ever be more useful than robots designed or configured for specific tasks.
- michaelt3 days ago
  The reason people keep working on human-like hands for robots is: The world is absolutely full of things adapted to be operated with human hands.
  Handling heavy boxes? Baking a cake? Operating a circular saw? Assembling a PC? Performing surgery? Loading a ream of paper into a printer? Playing a violin? Opening a door? You can do it all with two five-fingered hands.
  - ndileas3 days ago
    I think it's important to note that many individual humans are adapted to only a few of these tasks. A construction worker's hands and a magician have very different muscles, skin thickness, grip strength, dexterity, etc. even though they can both wash a dish and open a door.
    chrisco2553 days ago
    They have the same muscles, what do you mean? They are both humans. A construction worker is likely to have stronger hands, but by no means does that preclude them from performing a magic trick nor is the magician precluded from doing carpentry. It's likely they've done more than a few construction-based tasks in their life. It's even possible that one do both, it's merely choice and time that determine these things.
  - constantcryinga day ago
    >Handling heavy boxes?
    Are handled by non-humanoid Robots already.
    >Baking a cake?
    How do you think modern industrial baking works? All done by non-humanoid Roboters.
    >Operating a circular saw?
    Seriously? A circular saw is the perfect scenario where a humanoid is useless.
    >Performing surgery?
    Why are surgery Roboters not humanoid? Obviously it is easier to give robots custom tools than make human tools conform to robots.
    >Loading a ream of paper into a printer?
    Do you know how a modern printing press looks like?
    >Playing a violin?
    Google "mp3".
    >Opening a door?
    ARE YOU SERIOUS? Do you think automatic doors don't exist?
    It is far, far easier, cheaper, fast to make robots do a specific task than to make them do all tasks. Humans are terrible at most tasks, making robots to conform to humans is absurd.
- constantcryinga day ago
  The idea of humanoid Roboters in a factory is truly hilarious. There is a good reason for the form of industrial robots and replacing them with humanoids is ridiculous.
  What do you actually want to do with these humanoids.
beefnugs3 days ago
Its because after they saw how big a suckers everyone is for "AI" of course they can sell dumbasses a $60k vaguely human shaped thing that still wont be able to do laundrey or dishes or answer the door or screw in a screw or step over a puppy
DGAP3 days ago
Do these challenges apply to surgical robots? There's a lot of interest in essentially creating automated Davincis, for which there is a great deal of training data and for which the robots are prepositioned.
Maybe all this setup means that completing surgical tasks doesn't counter as dexterity.
m3kw93 days ago
Just today noticed without looking that I can tell from feel that there are 2 objects in a bag instead of one tells me we have likely 1000x different type of sensor and w we combine them all to form a meaning, and dexterity goes hand in hand with it