Four things I found most interesting:
1. The 97M-parameter Robot-Centric model beat the 840M and 900M alternatives. Local interaction propagation matters more than global context — a finding that aligns with what we see in classical MAPF (PIBT, LaCAM).
2. Image-based representations catastrophically failed (186% congestion delay error). You cannot represent a robot as a single pixel and expect to capture fleet interactions.
3. The 13M-parameter Graph-Floor model was surprisingly competitive. This suggests the warehouse's graph topology itself is the binding constraint on coordination quality.
4. The scaling laws section experimentally confirms power-law behavior over two orders of magnitude — giving substance to the "improves over time" claim.
I also tried to translate the technical findings into economic terms using Little's Law: a 10% travel time reduction in a 1,000-robot warehouse is equivalent to ~100 fewer robots ($1.3M/year).
I build Rovnou (https://rovnou.com), which does MAPF-based fleet coordination for non-Amazon warehouses. The DeepFleet paper validates many design choices we've made (event-driven, local context, graph-first), while also showing where foundation models may eventually surpass classical solvers.
Happy to discuss the architecture tradeoffs, MAPF vs. learned approaches, or practical deployment challenges.