Gemini Robotics On-Device brings AI to local robotic devices(deepmind.google)

214 pointsby meetpateltech3 days ago15 comments

jagger273 days ago
These are going to be war machines, make absolutely no mistake about it. On-device autonomy is the perfect foil to escape centralized authority and accountability. There’s no human behind the drone to charge for war crimes. It’s what they’ve always dreamed of.
Who’s going to stop them? Who’s going to say no? The military contracts are too big to say no to, and they might not have a choice.
The elimination of toil will mean the elimination of humans all together. That’s where we’re headed. There will be no profitable life left for you, and you will be liquidated by “AI-Powered Automation for Every Decision”[0]. Every. Decision. It’s so transparent. The optimists in this thread are baffling.
0: https://www.palantir.com/
- mateus13 days ago
  MIT spinoff Google-owned Boston Dynamics pledged not to militarize their robots. Which is very hard to believe given they're backed by DARPA, the DoD/Military investment arm.
  - arcticfox2 days ago
    This pledge would last five seconds in an actual conflict, if it makes it even that far.
  - jagger273 days ago
    Militarize is just bad marketing. Call them cleaning machines and put them to work on dirty things.
  - paxys3 days ago
    Was owned by Google. Then Softbank. Now Hyundai.
- bamboozled3 days ago
  How would these things be competitive with drones on the battlefield? They probably cost the equivalent of 1000 autonomous drones and 100x the time and materials to make, way more power would be required to make them work too.
  Terminator is a good movie but in reality, a cheap autonomous drone would mess one of those up pretty good.
  I've seen some of the footage from Ukraine, drones are deadly, efficient, they are terrifying on the battlefield. Even though those robots will get crazy maneuverable, it's going to be pretty hard to out run an exploding drone.
  Maybe the Terminators will have shotguns, but I could imagine 5 drones per terminator being a pretty easy to achieve considering they will be built by other autonomous robots.
- JumpCrisscross3 days ago
  > These are going to be war machines, make absolutely no mistake about it
  Of course they will. Practically everything useful has a military application. I'm not sure why this is considered a hot take.
  - jagger273 days ago
    The difference between this machine and the ones that came before is that there won’t have to be a human in the loop to execute mass murder.
    m00x2 days ago
    There's a clear task being given to the robot. If anything this will save lives. There are plenty of soldiers that love to kill for the hell of it, at least this will be easy to track down to who gave the order.
    JumpCrisscross3 days ago
    > there won’t have to be a human in the loop to execute mass murder
    This looks like an increasingly theoretical concern. (And probably always has been. Wars were far more brutal when folks fought face to face than they are today.)
- m00x2 days ago
  Good!
baron8163 days ago
I’m optimistic about humanoid robotics, but I’m curious about the reliability issue. Biological limbs and hands are quite miraculous when you consider that they are able to constantly interact with the world, which entails some natural wear and tear, but then constantly heal themselves.
- gene-h3 days ago
  Industrial robots at least are very reliable, MTBF is often upwards of 100,000 hours[0]. Industrial robots are optimized to be as reliable as possible because the longer they last and less often they need to be fixed, the more profitable they are. In fact, German and Japanese companies came to dominate the industrial robotics market because they focused on reliability. They developed rotary electric actuators that were more reliable. Cincinnati Millicron(US) was out competed in the industrial robot market because although their hydraulic robots were strong, they were less reliable.
  I am personally a bit skeptical of anthropormophic hands achieving similarly high reliability. There's just too many small parts that need to withstand high forces.
  [0]https://robotsdoneright.com/Articles/what-are-the-different-...
  - ragebol2 days ago
    If you E-stop an industrial robot, it stops immediately, all OK. If a humanoid were to freeze like that, it would fall over and hurt you and your stuff on the way down, when it'll damage itself.
    Mechanical reliability is not the main concern IMO
- marinmania3 days ago
  It does either get very exciting or very spooky thinking of the possibilities in the near future.
  I had always assumed that such a robot would be very specific (like a cleaning robot) but it does seem like by the time they are ready they will be very generalizable.
  I know they would require quite a few sensors and motors, but compared to self-driving cars their liability would be less and they would use far less material.
  - fragmede3 days ago
    The exciting part comes when two robots are able to do repairs on each other.
    marinmania3 days ago
    I think this is the spooky part. I feel dumb saying it, but is there a point where they are able to coordinate and build a factory to build chips/more of themselves? Or other things entirely?
    bamboozled3 days ago
    Of course there is
    pryelluw3 days ago
    2 bots 1 bolt ?
    ta9882 days ago
    But this still has a massive cost. Replacing or repairing an actuator isn't cheap, in material and in time of unavailability.
    jacobaul2 days ago
    To maybe get a little carried away with the sci-fi for a minute, why does the Actuator need to cost anything?
    When the tree of costs that make up a product are traced, surely all the leaf nodes are human labour? As in, to make the actuator, I had to pay someone to assemble it and I had to buy the parts. Each part had a materials cost and a labour cost. So it goes for the factory that made the fasteners, the foundry that made the steel, the mine that extracted the ore.
    Shudder to think of how to regulate resource extraction in a future where AI humanoid robots are strip mining and logging for free.
    david-gpu2 days ago
    > When the tree of costs that make up a product are traced, surely all the leaf nodes are human labour?
    What about energy, real estate and taxes?
    Even at the extreme end of automation, if you want iron ore, you need to buy a mine from somebody, pay taxes on it, and power the machines to extract the minerals and transport them elsewhere for processing.
    jacobaula day ago
    The same logic applies to energy I think. We don't have to pay money to a wind turbine, or to a coal mine. We only pay money to humans to build the power plants and the grid.
    If I were writing a sci-fi novel about this I don't know how I'd handle something real estate (or mineral rights or water rights). You already need permission from the government to extract resources.
    As for taxes, why does the government even want the money? What are they going to do with it?
    david-gpua day ago
    Energy, ultimately, requires real estate --and thus property taxes-- even at the logical extreme of automation.
    > As for taxes, why does the government even want the money? What are they going to do with it?
    There are websites that break down how e.g. different national/federal budgets are divvied up in the real world. Alternatively, I suggest a good book on macroeconomics; I am partial to Steve Keen's "Debunking Economics", but there are many others.
- UltraSane3 days ago
  Consumable components could be automatically replaced by other robots.
- bamboozled3 days ago
  I'm interested how differences with robots work overtime, there are a lot of machines in this world that have been patched or "jimmied up" to continue working, let's say a mining robot, it would probably get quite heavily contaminated with dust, wear would occur in different places, rock falls might bend parts.
  So even though another robot could probably do the "jimmy up". it seems like overtime, the robots will "drift" into all being a bit different.
  Even commercial airlines seem to go through fairly unique repairs from things like collisions with objects, tail strikes etc.
  Maybe it's just easier to recycle robots?
- didip3 days ago
  I think those problems can be solved with further research in material science, no? Combined that with very responsive but low torque servos, I think this is a solvable problem.
  - michaelt3 days ago
    It's a simple matter of the number of motors you have. [1]
    Assume every motor has a 1% failure rate per year.
    A boring wheeled roomba has 3 motors. That's a 2.9% failure rate per year, and 8.6% failures over 3 years.
    Assume a humanoid robot has 43 motors. That gives you a 35% failure rate per year, and 73% over 3 years. That ain't good.
    And not only is the humanoid robot less reliable, it's also 14.3x the price - because it's got 14.3x as many motors in it.
    [1] And bearings and encoders and gearboxes and control boards and stuff... but they're largely proportional to the number of motors.
    mewpmewp23 days ago
    Would it be possible to reduce the failure rates?
    ac293 days ago
    The 1%/year failure rate appears to just be made up. There are plenty of electric motors that dont have anywhere near that failure rate (at least during the expected service life, failure rates certainly will probably hit 1%/year or higher eventually).
    For example, do the motors in hard drives fail anywhere close to 1% a year in the first ~5 years? Backblaze data gives a total drive failure rate around 1% and I imagine most of those are not due to failure of motors.
    michaelt3 days ago
    Yes, obviously that 1% figure is a simplification. Of course not all motors are created equal, and neither are all operating conditions!
    But the neat thing about my argument is it holds true regardless of the underlying failure rate!
    So long as your per-motor annual failure rate is >0, 43x it will be bigger than 3x it.
    mewpmewp27 hours ago
    Uuids have failure possibilities, yet they are used very successfully. It is all about the failure rate.
    mrheosuper2 days ago
    your calculation is true, but the absolute number is needed here.
    43x of 1% failure rate is tragic, but 43x of 0.1% is acceptable in my book.
    chrsw2 days ago
    Yes but like almost everything it comes down to cost. Most consumer devices are extremely cost constrained. Industrial robots can justify higher costs that come with higher reliability.
    michaelt3 days ago
    To an extent, yes.
    For example, an industrial robot arm with 6 motors achieves much higher reliability than a consumer roomba with 3 motors. They do this with more metal parts, more precision machining, much more generous design tolerances, and suchlike. Which they can afford by charging 100x as much per unit.
    bamboozled3 days ago
    Also factory robots arms are probably operating in highly sterile, dry environments? How would working in a muddy / dusty / wet environment change this?
    Robelius2 days ago
    When designing hardware, you usually define what the expected operating environments are. Some typical environmental considerations are the min/max temperature, debris ingress, shock & vibration. If you know your product is going to operate in an area where material is likely to enter the product, then you can either try to keep that material out (sealing the product up), or make sure that dust entering the product won't cause failures (i.e. electrical shorts won't happen on a board by covering exposed areas with glue or making sure a mechanism can crush/clear particles). It's not necessarily more complexity in the product to navigate these constrains, but it is another thing to consider in the design.
    For example, if you're making a phone that is going to be sold around the world, then you're going to worry about arctic/equator temps (will some of your components melt or ICs fail), salty sea air (will the product begin to corrode for people living by a beach), or fast moving elevators (will the speakers pop from a sudden change in pressure).
    You can check out this manufacturers robot arms as some examples of existing products. They list some data sheets for their robot arms, including some arms that are IPxx rated. I don't think looking at robot arms is a 1to1 comparison for what you could expect from a humanoid robot since the considerations in the design process are going to be different.
    website is kuka dot com/en-at/products/robotics-systems/industrial-robots/kr-agilus
    michaelt2 days ago
    Some are, some aren't.
    For example, MIG welding robots tend to life a hard life. And if you look at photos of industrial painting robots, you'll find they're often fitted with plastic smocks.
    If you look up photos online you'll only get marketing images from robot makers, where everything is shiny and brand new - I can assure you, it's not like that after they've been operating for a decade or two :)
    ta9882 days ago
    Search for CNC videos, those machines work in oily, soapy, dusty and full of metal shavings environments and do fine.
    bamboozled2 days ago
    It's still a fairly controlled environment with splash guards, liquid based dusts suppression and or dust collection, even then my friend has a factory and the CNC is a reliable machine but things screw up.
    If the dust collection was disabled, the workshop and the machine would be caked in debris.
    It doesn't move, it doesn't fall over or have anything falling on top of it either (like a robot could).
    elcritch2 days ago
    With more motors and joints also comes some degree of redundancy however. Having multiple fingers means one finger dying won't be as big of an impedement. It'd require feedback and the ability for the motion planner / AI to account for it.
    Plus they'll likely be modular and able to be replaced.
    IMHO, the bigger design issue for humanistic is lowering the need for mechanical precision which requires lots more metals and instead using adaptive feedback and sensors to obtain accuracy similar to how humans and animals do it. AIs should be really good at that, eventually. I think the compute will need to be about 10x what it is now though.
Toritori123 days ago
Does Anyone know how easy is to join the "trusted tester program" and if they offer modules that you can easily plug-in to run the sdk?
- technotony3 days ago
  There's a sign up button at the bottom of the article...
martythemaniak3 days ago
I've spent the last few months looking into VLAs and I'm convinced that they're gonna be a big deal, ie they very well might be the "chatgpt moment for robotics" that everyone's been anticipating. Multimodal LLMs already have a ton of built-in understanding of images and text, so VLAs are just regular MMLLMs that are fine-tuned to output a specific sequence of instructions that can be fed to a robot.
OpenVLA, which came out last year, is a Llama2 fine tune with extra image encoding that outputs a 7-tuple of integers. The integers are rotation and translation inputs for a robot arm. If you give a vision llama2 a picture of a an apple and a bowl and say "put the apple in the bowl", it already understands apples, bowls, knows the end state should apple in bowl etc. What missing is a series of tuples that will correctly manipulate the arm to do that, and the way they did it is through a large number of short instruction videos.
The neat part is that although everyone is focusing on robot arms manipulating objects at the moment, there's no reason this method can't be applied to any task. Want a smart lawnmower? It already understands "lawn" "mow", "don't destroy toy in path" etc, just needs a finetune on how to corectly operate a lawnmower. Sam Altman made some comments about having self-driving technology recently and I'm certain it's a chat-gpt based VLA. After all, if you give chatgpt a picture of a street, it knows what's a car, pedestrian, etc. It doesn't know how to output the correct turn/go/stop commands, and it does need a great deal of diverse data, but there's no reason why it can't do it. https://www.reddit.com/r/SelfDrivingCars/comments/1le7iq4/sa...
Anyway, super exciting stuff. If I had time, I'd rig a snowblower with a remote control setup, record a bunch of runs and get a VLA to clean my driveway while I sleep.
- ckcheng3 days ago
  VLA = Vision-language-action model: https://en.wikipedia.org/wiki/Vision-language-action_model
  Not https://public.nrao.edu/telescopes/VLA/ :(
  For completeness, MMLLM = Multimodal Large language model.
- Workaccount23 days ago
  I don't think transformers will be viable for self driving cars until they can both:
  1) Properly recognize what they are seeing without having to lean so hard on their training data. Go photoshop a picture of a cat and give it a 5th leg coming out of it's stomach. No LLM will be able to properly count the cat's legs (they will keep saying 4 legs no matter how many times you insist they recount).
  2.) Be extremely fast at outputting tokens. I don't know where the threshold is, but its probably going to be a non-thinking model (at first) and probably need something like Cerebras or diffusion architecture to get there.
  - martythemaniak3 days ago
    1. Well, based on Karpathy's talks on Tesla FSD, his solution is to actually make the training set reflect everything you'd see in reality. The tricky part is that if something occurs 0.0000001% IRL and something else occurs 50% of the time, they both need to make 5% of the training corpus. The thing with multimodal LLMs is that lidar/depth input can just be another input that gets encoded along with everything else, so for driving "there's a blob I don't quite recognize" is still a blob you have to drive around.
    2. Figure has a dual-model architecture which makes a lot of sense: A 7B model that does higher-level planning and control and a runs at 8Hz, and a tiny 0.08B model that runs at 200Hz and does the minute control outputs. https://www.figure.ai/news/helix
  - cgearhart2 days ago
    The current gen VLA architectures include some tricks (like compressed action tokenization and diffusion decoding) to reach action frequencies between 50-200hz. I think they’re _more_ efficient this way than regular LLMs trying to do everything thru text.
- generalizations3 days ago
  I will be surprised if VLAs stick around, based on your description. That sounds far too low-level. Better hand that off to the 'nervous system' / kernel of the robot - it's not like humans explicitly think about the rotation of their hip & ankle when they walk. Sounds like a bad abstraction.
suyash3 days ago
What sort of hardware does the SDK runs on, can it run on a modern Raspberry Pi ?
- martythemaniak3 days ago
  You can think of these as essentially multi-modal LLMs, which is to say you can have very small/fast ones (SmolVLA - 0.5B params) that are good at specific tasks, and larger/slower more general ones (OpenVLA - a finetuned llama2 7B). So a rpi could be used for some very specific tasks, but even the more general ones could run on beefy consumer hardware.
- ethan_smith3 days ago
  According to the blog post, it requires an NVIDIA Jetson Orin with at least 8GB RAM, and they've optimized for Jetson AGX Orin (64GB) and Orin NX (16GB) modules.
  - v9v3 days ago
    Could you quote where in the blog post they claim that? CTRL+F "Jetson" gave no results in TFA.
    moffkalast3 days ago
    Yeah they didn't really mention anything, I was almost getting my hopes up that Google might be announcing a modernized Coral TPU for the transformer age, but I guess not. It's probably all just API calls to their TPUv6 data centers lmao.
- estormy2 days ago
  There's a post on x from one of the project contributors that says it fits on a 4090: https://x.com/sippeyxp/status/1937520297789497668
moelf3 days ago
The MuJoCo link actually points to https://github.com/google-deepmind/aloha_sim
- westurner2 days ago
  mujoco_menagerie has Mujoco MJCF XML models of various robots.
  google-deepmind/mujoco_menagerie: https://github.com/google-deepmind/mujoco_menagerie
  mujoco_menagerie/aloha: https://github.com/google-deepmind/mujoco_menagerie/tree/mai...
TZubiri2 days ago
Nice. I work with some students younger than 13, so most cloud and llms are quite tricky to work with, local only models like vertex are nice for this use case. I will try this as a replacement for chatgpt as Computer Vision in robotics like Lego Mindstorm
zzzeek3 days ago
THANK YOU.
Please make robots. LLMs should be put to work for *manual* tasks, not art/creative/intellectual tasks. The goal is to improve humanity. not put us to work putting screws inside of iphones
(five years later)
what do you mean you are using a robot for your drummer
polskibus3 days ago
What is the model architecture? I'm assuming it's far away from LLMs, but I'm curious about knowing more. Can anyone provide links that describe architectures for VLA?
- KoolKat233 days ago
  Actually very close to one I'd say.
  It's a "visual language action" VLA model "built on the foundations of Gemini 2.0".
  As Gemini 2.0 has native language, audio and video support, I suspect it has been adapted to include native "action" data too, perhaps only on output fine-tuning rather than input/output at training stage (given its Gemini 2.0 foundation).
  Natively multimodal LLM's are basically brains.
  - quantumHazer3 days ago
    > Natively multimodal LLM's are basically brains.
    Absolutely not.
    KoolKat232 days ago
    Lol keep telling yourself that. It's not a human brain nor is it necessarily a very intelligent brain, but it is a brain nonetheless.
    quantumHazer2 days ago
    Not a useful commentary. ANN and BNN are slightly correlated. That fact that you want to believe it is a brain tells a lot about you, but it doesn’t make a model a brain.
    Only suggestion I have is “study more”.
    KoolKat2314 hours ago
    They're not merely slightly correlated.
    If it looks like a duck and quacks like a duck...
    Just because it is alien to you, does not mean it is not a brain, please go look up the definition of the word.
    And my comment is useful, a VLA implies it is processing it's input and output natively, something a brain does hence my comment.
  - martythemaniak3 days ago
    OpenVLA is basically a slightly modified, fine-tuned llama2. I found the launch/intro talk by lead author to be quite accessible: https://www.youtube.com/watch?v=-0s0v3q7mBk
    m00x2 days ago
    A more modern one, smolVLA is similar and uses a VLM but skips a few layers and uses an action adapter for outputs. Both are from HF and run on LeRobot.
    https://arxiv.org/abs/2506.01844
    Explanation by PhosphoAI: https://www.youtube.com/watch?v=00A6j02v450
    KoolKat232 days ago
    In the paper at the bottom of googles page, this VLA says it is built on the foundations of Gemini 2.0 (hence my quotations). They'd be using Gemini 2.0 rather than llama.
    https://arxiv.org/pdf/2503.20020
Workaccount23 days ago
I continued to be impressed how Google stealth releases fairly groundbreaking products, and then (usually) just kind of forgets about them.
Rather than advertising blitz and flashy press events, they just do blog posts that tech heads circulate, forget about, and then wonder 3-4 years later "whatever happened to that?"
This looks awesome. I look forward to someone else building a start-up on this and turning it into a great product.
- fusionadvocate3 days ago
  Because the whole purpose of these kinds of projects at Google is to keep regulators at bay. They don't need these products in the sense of making money from them. They will just burn some money and move on, exactly the way they did hundreds of times. But what kind of company has such a free pass to burning money? The kind of company that is a monopoly. Monopolies are THAT profitable.
antonkar3 days ago
The only way to prevent robots from being jailbroken and set to rob banks is to move GPUs to private SOTA secure GPU clouds
san19272 days ago
meanwhile i will drink a coffee while it loads a reply from the API
sajithdilshan3 days ago
I wonder what kind of guardrails (like Three Laws of Robotics) there are to prevent the robots going crazy while executing the prompts
- ctoth3 days ago
  The laws of robotics were literally designed to cause conflict and facilitate strife in a fictional setting--I certainly hope no real goddamn system is built like that,.
  > To ensure robots behave safely, Gemini Robotics uses a multi-layered approach. "With the full Gemini Robotics, you are connecting to a model that is reasoning about what is safe to do, period," says Parada. "And then you have it talk to a VLA that actually produces options, and then that VLA calls a low-level controller, which typically has safety critical components, like how much force you can move or how fast you can move this arm."
  - conception3 days ago
    Of course someone will. The terror nexus doesn’t build itself, yet, you know.
- hlfshell3 days ago
  The generally accepted term for the research around this in robotics is Constitutional AI (https://arxiv.org/abs/2212.08073) and has been cited/experimented with in several robotics VLAs.
  - JumpCrisscross3 days ago
    Is there any evidence we have the technical ability to put such ambiguous guardrails on LLMs?
- Symmetry2 days ago
  Current guardrails are more IEC 61508 than anything like the three laws.
- hn_throwaway_993 days ago
  A power cord?
  - sajithdilshan3 days ago
    what if they are battery powered?
    bigyabai3 days ago
    That's what we use twelve gauge buckshot for, here in America.
    msgodel3 days ago
    Usually I put master disconnect switches on my robots just to make working on them safe. I use cheap toggle switches though I'm too cheap for the big red spiny ones.
    pixl973 days ago
    [Robot learns to superglue the switch open]
    msgodel3 days ago
    It's only going to do that if you RL it with episodes that include people shutting it down for safety. The RL I've done with my models are all simulations that don't even simulate the switch.
    pixl973 days ago
    Which will likely work for only on machine AI, but it seems to me any very complicated actions/interactions with the world may require external interactions with LLMs which know these kind of actions. Or in the future the models will be far larger and more expansive on device containing this kind of knowledge.
    For example, what if you need to train the model to keep unauthorized people from shutting it off?
    msgodel3 days ago
    Having a robot near people with no master off switch sounds like a dumb idea.
- asadm3 days ago
  in practice, those laws are bs.
MidoriGlow2 days ago
Elon Musk said in last week’s Starship Update: the very first Mars missions are planned to be flown by Optimus humanoid robots to scout and build basic infrastructure before humans arrive (full transcript + audio: https://transpocket.com/share/oUKhep6cUl3s/). If Gemini Robotics On-Device can truly adapt to new tasks with ~50–100 demos, pairing that with mass-produced Optimus bodies and Starship’s lift capacity could be powerful—offline autonomy, zero-latency control, and the ability to ship dozens of robots per launch.
- lm284692 days ago
  Elon Musk said in 2016 that we'd have fully autonomous cars by the end of the year and we'd be on Mars by 2018, with manned missions by 2024.
  Fast forward to 2025, weeks have no self driving cars, and nothing is even close to getting to Mars, let alone manned
suninsight3 days ago
This will not end well.