5 pointsby filtr123 hours ago3 comments

nithisha2201an hour ago
Interesting, how do you handle the observability side during training? One thing I ran into with multi-agent RL is that reward signals alone don't tell you much about why an agent is failing. Curious if you've built any tooling around that.
georaa3 hours ago
Browser agents are the use case where RL makes the most sense - the reward signal is obvious (did the task get done or not) and the action space is bounded. Curious how you handle the credit assignment problem across multi-step navigation though.
Remi_Etienan hour ago
[dead]