Treating data, schema, and ETL as versioned first-class assets makes a lot of sense, and the AI angle only works because rollback is built in.
Curious how you handle branching/merging at scale and how this compares to Iceberg/Delta time travel. If you can nail trust, observability, and cost predictability, this could be a meaningful primitive for data engineers.
> Curious how you handle branching/merging at scale and how this compares to Iceberg/Delta time travel.
At a high level, we are using the Iceberg primitives for data versioning, and this should help give customers confidence about our reliability. At the same time, we also separately version schema and ETL too, and then tie these all together (versions of data, schema and ETL) so that rollbacks are simple and smooth. Did I mention that we support cascading rollbacks :D
On branching, yes, we support branching of your whole datalake (again supported by the strong primitives discussed above). This means that a typical workflow would look like: (1) create a new branch for a new feature or major schema refactor (2) make changes, and test them out (3) once confident, promote the changes to prod.
> If you can nail trust, observability, and cost predictability, this could be a meaningful primitive for data engineers.
Yes, this is what we are thinking too, and we are working on some features around this.