5 pointsby filtr123 hours ago3 comments
  • nithisha2201an hour ago
    Interesting, how do you handle the observability side during training? One thing I ran into with multi-agent RL is that reward signals alone don't tell you much about why an agent is failing. Curious if you've built any tooling around that.
  • georaa3 hours ago
    Browser agents are the use case where RL makes the most sense - the reward signal is obvious (did the task get done or not) and the action space is bounded. Curious how you handle the credit assignment problem across multi-step navigation though.
  • Remi_Etienan hour ago
    [dead]