If I am guessing right, Motherduck will likely be acquired by GCP because most of the founding team was ex-BQ. Snowflake purchased Modin and polars is still quite immature to be acquisition ready. So, what does this leave us with. There is also EDB who is competing in enterprise Postgres space.
Folks I know in the industry are not very happy with databricks. Databricks themselves was hinting people that that they would be potentially acquired by Azure as Azure tries to compete in the data warehouse space. But everyone become an AI company which left Databricks in an awkward space. Their bdev team is not bestest from my limited interactions with them (lots of starbucks drinkers and let me get back to you after a 3 month PTO), so they do not know who or how to lead them to an AI pivot. With cash to burn from overinvestment and the snowflake/databricks conf coming up fast they needed a big announcement and this is that big announcement.
Should have sobered up before writing this though. But who cares.
Databricks and Microsoft (thru Fabric) are trying to build a complete data platform, i.e. ELT + datalake + BI
My bet with Definite (https://www.definite.app/) has been this is too hairy for a large company to do well and we can do it better.
BDev can be good or bad. Bad ones tend to not follow up, and Starbucks here represents they have poor decision making skills (reinforced by going on PTO for three months and not following up on commitments).
Yeah, big companies globing up everything does not lead to a healthy ecosystem. Congrats on the founders for their the acquisition but everyone else loses with movements like this.
I'm still sour after their Redash purchase that instantly "killed" the open source version. Tabular acquisition was also a bit controversial since one of the founders is the PMC Chair for Iceberg which "competes" directly with Databricks own delta lake. The mere presence of these giants (mostly databricks and snowflake) makes the whole data ecosystem (both closed and open source) really hostile.
In vino veritas, and all that; we appreciate your honesty!
I really hope they can maintain this dedication after acquisition, but Databricks will probably push them into enterprise and it will lose the spark. I wish Cloudflare bought them instead.
Mostly a lot of OLAP starts when the data loads in Kafka logs or a disk of some sort.
Then you schedule a task or keep a task polling this constantly, which is always prone to small failures & delays or big failures when schema changes up.
The "data pipeline" team exists because the data doesn't move by itself from where it is first stored to where it is ready for deep analysis.
If you can directly push 1-row updates transactionally to a system and feed off the backend to write a more OLAP friendly structure, then you can hookup things like a car rental service's operational logs into a system which can compute more complex things like forecasting of availability or apply discounts to give a customer an upgrade for cheap.
Neon looks a lot better than YugaByte in tech (which also talks postgres protocols) and a lot nicer in protocol compatibility than something like FoundationDB.
Alloy from Google feels somewhat similar, Spanner has a postgres interface too.
The postgres API is a great abstraction common point, even if the actual details of the implementations vary a lot.
I've been bullish on neon for a while -- the idea hits exactly the right spot, IMO, and their execution looks good in my limited experience.
But I mean that from a technical perspective. I never have any real idea about the business -- do they have an edge that makes people want to start paying them money and keep paying them money? Heck if I know.
I guess that's going to be Databricks problem now (maybe).
It seems like execution >>> idea in this case
It opens up some interesting ideas/concepts when creating an isolated DB is just as easy as creating a new db table.
But as I mentioned, I mean from a tech standpoint... If you're interested, they've posted various things about how the tech works.
> It seems like execution >>> idea in this case
I don't know what >>> means here, so possibly I complete agree or perhaps completely disagree.
I have an application deployed on Railway with a Postgres database and the user's latency is consistent 150ms. The same application deployed on these serverless/edge provider is anywhere between 300-400ms with random spikes to 800ms. The same application, same data, and same query.
The edge and serverless has to be the biggest scam in cloud industry right now.
They aren't faster, and they aren't cheaper. You could argue they are easier to scale, but that not he case anymore since everyone provides autoscaling now.
And it begs comparisons to comments about Dropbox/rsync, etc...
But, I personally think the Neon concept of branching databases with CoW storage is quite interesting. That, combined with cost-management with autoscaling does seem like at least a serviceable moat.
DigitalOcean, Railway, Render, and so on all offer the exact same feature except it's just pure Postgres and you can deploy them in the same data center as your application.
https://neon.tech/blog/how-to-minimise-the-impact-of-databas...
I'm a solo dev that has been installing and running my own database server with backups for decades and have never had a problem with it. It's so simple, and I have no idea why people are so allergic to managing their own server. 99% of apps can run very snappily on a single server, and the simplicity is a breath of fresh air.
I share similar experiences like yours and others in this thread, and to me all those operational concerns grow into unnecessary noise that distracts from the real problems that we are paid to solve.
Neon's multi-region support isn't directly comparable to a single Postgres database in a single data center. You can set up Neon in a single data center, too, and I would expect the same performance in that case.
Meanwhile, if you tried to scale your single-Postgres to a multi-region setup, you'd expect higher latencies relative to the location of your data.
It'd be a lot of work to run an apples to apples test with a Google Cloud Postgres db vs. Supabase and see what the difference is.
Just because you don't derive value out of something doesn't mean it is a scam.
Did Delta Lake ever catch on? Where are they going now?
Enterprise view: delegate AI environment to Databricks unless you’re a real player. Market is too chaotic, so rely on them to keep your innovation pipeline fed. Focus on building your own core data and AI within their environment. Nobody got fired for choosing Databricks.
Oh well. Databricks notebooks were hella cool back when companies were willing to spend lavishly on having engineers write cloud hosted Scala in the first place, and at premium prices to boot.
I'm joking, but only a bit. Iceberg is open source (Apache), but a lot of the core team and the creator worked at Tabular and Databricks bought them for $1B.
0 - https://www.definite.app/blog/databricks-tabular-acquisition
Personally, I hated databricks, it caused endless pain. Our org has less than 10TB of data and so it's overkill. Good ol' Postgres or SQL Server does just fine on tables of a few hundred GB, and bigquery chomps up 1TB+ without breaking a sweat.
Everything in databricks - everything - is clunky and slow. Booting up clusters can take 15 minutes whereas something like bigquery is essentially on-demand and instant. Data ETL'd into databricks usually differs slightly from its original source in subtle but annoying ways. Your IDE (which looks like jupyter notebook, but is not) absolutely suck (limited/unfamiliar keyboard shortcuts, flakey, can only be edited in browser), and you're out of luck if you want to use your favorite IDE, vim etc.
Almost every databricks feature makes huge concessions on the functionality you'd get if you just used that feature outside of databricks. For example databricks has it's own git-like functionality (which is the 5% of git that gets most used, but no way to do the less common git operations).
My personal take is databricks is fine for users who'd otherwise use their laptop's computer/memory - this gets them an environment where they can access much more, at about 10x the cost of what you'd pay for the underlying infra if you just set it up yourself. Ironically, all the databricks-specific cruft (config files, click ops) that's required to get going will probably be difficult for that kind of user anyway, so it negates its value.
For more advanced users (i.e. those that know how to start an ec2 or anything more advanced), databricks will slow you down and be endlessly frustrating. It will basically 2-10x the time it takes to do anything, and sap the joy out of it. I almost quit my job of 12 years because the org moved to databricks. I got permission to use better, faster, cheaper, less clunky, open-source tooling, so I stayed.
Note that Databricks SQL Serverless these days can be provisioned in a few seconds.
That's the point. Our org was told databricks would solve problems we just didn't have. Serverful has some wonderful advantages: simplicity, (ironically) cheaper (than something running just 3-4 hours a day but which costs 10x), familiarity, reliability. Serverless also has advantages, but only if it runs smoothly, doesn't take an eternity to boot, isn't prohibitively expensive, and has little friction before using it - databricks meets 0/4 of those critera, with the additional downside of restrictive SQL due to spark backend, adding unnecessary refactoring/complexity to queries.
> your setup is not really practical to have a lot of people collaborating
Hard disagree. Our methods are simple and time-tested. We use git to share code (100x improvement on databricks' version of git). We share data in a few ways, the most common are by creating a table in a database or in S3. It doesn't have to be a whole lot more complicated.
But you are doing a disingenuous comparison here because one can keep a "serverful" cluster up without shutting it down, and in that case, you'd never need to wait for anything to boot up. If you shut down your EC2 instances, it will also take time to boot up. Alternatively, you can use the (relatively new) serverless offering from them that gets you compute resources in seconds.
We had 8 data engineers onboarding the org to databricks, it was only after 2 solid years before they got to working on serverless (it was because users complained of user unfriendliness of 'nodes', and managers of cost). But then, there were problems. A common pattern through my grep of slack convos is "I'm having this esoteric error where X doesn't work on serverless databricks, can you help".. a bunch of back and forth (sometimes over days) and screenshots followed by "oh, unfortunately, serverless doesn't support X".
Another interesting note is someone compared serverless databricks to bigquery, and bigquery was 3x faster without the databricks-specific cruft (all bigquery needs is an authenticated user and a sql query).
Databricks isn't useless. It's just a swiss army knife that doesn't do anything well, except sales, and may improve the workflows for the least advanced data analysts/scientists at the expense of everyone else.
I'd have rather stuck with Spark just because I prefer Scala or Python to SQL (and that comes with e.g. being far easier to unit test), but life happened and that ecosystem was getting disrupted anyway.
A few just off the top of my head:
* You can't .persist() DataFrames in serverless. Some of my work involves long pipelines that wind up with relatively small DFs at the end of them, but need to do several things with that DF. Nowhere near as easy as just caching it. * Handling object storage mounted to Unity Catalog can be a nightmare. If you want to support multiple types of Databricks platforms (AWS, Azure, Google, etc.), then you will have to deal with the fact that you can't mount one type's object storage with another. If you're on Azure Databricks, you can't access S3 via Unity Catalog. * There's no API to get metrics like how much memory or CPU was consumed for a given job. If you want to handle monitoring and alerting on it yourself, you're out of luck. * For some types of Serverless compute, startup times from cold can be 1 minute or more.
They're getting better, but Databricks is an endless progression of unpleasant surprises and being told "oh no you can't do it that way", especially compared to Snowflake, whose business Databricks has been working to chew away at for a while. Their Variant type is a great example. It's so much more limited than Snowflake's that I'm still learning new and arbitrary ways in which it's incompatible with Snowflake's implementation.
I guess different people just have different experiences.
Serverless is meant to obviate some of that. But it is less compelling when the vendor tries to gobble up that margin for themselves.
This is me being less jaded. Support those little wins!
because of this separation, the compute (e.q SQL parsing, etc) can be scaled independently and the storage can also do the same, which for example use AWS S3
so if your SQL query is CPU heavy, then Neon can just add more "compute" nodes while the "storage" cluster remain the same
to me, this is similar to what the usual microservice where you have a API service and DB. the difference is Neon is purposely running DB on top of that structure
You can deploy serverless technologies in a self hosted setup and not get "overcharged". Is a system thread bullshit marketing over a system process?
https://blog.bit.io/whats-next-for-bit-io-joining-databricks... https://www.databricks.com/blog/welcoming-bit-io-databricks-...
Or it's just a business decision to corner the market, as someone else said
I went to Archive.org and figured out that in 2023, they announced they were shutting down on May 30th, all databases shutdown on June 30th, only available for downloads after that, and deleted on July 30th.
I hate that this is what I've become, I want to try some of the cool features "postgres++" providers offer but I actively avoid most features fearing the potential future migration. I got burned using the Data API on Aurora Serverless and then leaving them and having to rewrite a bunch of code.
Given how lax antitrust enforcement is, probably this
Either way, there are plenty of other serverless Postgres options out there, Supabase being one of the most popular.
You set up a database, you connect to it, they take care of the rest. It even scales to $0 if you don't use it.
Is that not serverless Postgres?
Supabase gives you a server that runs classic Postgres in a process. Scaling in this scenario means you increase your server's capacity, with a potential downtime while the upgrade is happening.
You are confusing _managed_ Postgres for _serverless_.
Others in the serverless Postgres space:
- https://www.orioledb.com/ (pg extension)
- https://www.thenile.dev/ (pg "distribution")
- https://www.yugabyte.com/ (not emphasizing serverless but their architecture would allow for it)
After a funding round the value extraction from customers is just over the horizon
I haven't studied the CLA situation in order to know if a rug pull is on the table but Tofu and Valkey have shown that where there's a will there's a way
Thankfully, you can continue to pay Databricks whatever they ask for the privilege of them hosting it for you
[1] https://aws.amazon.com/blogs/database/introducing-scaling-to...
For someone looking to join the company, I cannot imagine IPO to be a motivation anymore.
This is better than earlier stage startups , while you get far better multiples , it is also quite possible that you are let go somewhere into the cycle without the money to vest the options for tax reasons and there is short vesting period on exit.
For this reason companies these days offer 5/10 yr post leaving as a more favorable offer
——
For founders it is gives them a shorter window to a exit than on their own, and in revenue light and tech heavy startup like neon (compared to databricks) the value risk is reduced because stock they get in acquisition is based on real revenue and growth not early stage product traction as neon would be today .
They also have some cash component which is usually enough to buy core things in most founders look at like buying a house in few million range or closing mortgages or invest in few early stage projects directly or through funds
Many companies raise money only to give liquidity to founders / employees and some early investors even if they don’t money for operations at all.
While Databricks is large , there are much bigger companies which would have IPOed at smaller sizes in the past which are delaying (may never do) today. Stripe and SpaceX are the biggest examples both have healthy positive cash flows but don’t feel the value of going public . Buying back shares and options is the only route if you don’t have IPO plans if you want to keep early stage employees happy
Do they still have a lot of $$$?
Thankfully, I just need "Postgres", I wasn't depending on any other features so I can migrate easily if things start going south.
As though folks are looking for exits but IPO isn’t an option.
Think we’re approaching a reckoning for lots of companies that raised circa 2021 at valuations that are no longer plausible and AI startups.
Oh, and ones in the first group that tried to rebrand as the second…
What’s with all these Postgres hosting services being worth so much now?
Someone at AWS probably thought about this, easy to provision serverless Postgres, and they just didn’t build it.
I’m still looking for something that can generate types and spit it out in a solid sdk.
It’s amazing this isn’t a solved problem. A long long time ago, I was apart of a team trying to sort this out. I’m tempted to hit up my old CEO and ask him what he thinks.
The company is long gone…
If anything we tried to do way too much with a fraction of the funding.
In a hypothetical almost movie like situation I wouldn’t hesitate to rejoin my old colleagues.
The issue then, as is today is applications need backends. But building backends is boring, tedious and difficult.
Maybe a NoSql DB that “understands” the Postgres API?
These high value startups timed well to capture the vibe coding (was known as builidng an MVP before), front end culture and sheer volume of internet use and developers.
You have to understand a separate set of concerns. Spin something up on ec2, hook it into a db, configure https , figure out why it went down, etc.
You’re right though, once I build a complex front end I want someone else to do the backend.
Their choice of Deno for edge functions is... Well, unique.
For my current project I have to do a lot of quirky logic, and I kept hitting a brick wall with Supabase.
I also didn't enjoy the self hosting journey. Not exactly easy.
For the other stuff, what do you find quirky?
Supabase only supports Deno. The quirkiness is my own server side logic. Tbf, I've tried to build this project at least 4 times and I might need to take a step back.
AWS is working on this as well: https://aws.amazon.com/blogs/database/introducing-amazon-aur...
I believe there are several of these already, like Cockroach DB.
1. An acquihire (if your a Neon customer this would probably be a bad outcome for you).
2. A growth play. Neon will be positioned as an 'application layer' product offered cheap to bring SaaS startups into the ecosystem. As those growth startups grow and need more services sell them everything else.
Say right now I have an e-commerce site with 20K MAU. All metrics are going to Amplitude and we can use that to see DAU, retention, and purchase volume. At what point in my startup lifecycle do we need to enlist the services?
I recently worked on some data pipelines with Databricks notebooks ala Azure Fabric. I'm currently using ~30% of our capacity and starting to get pushback to run things less frequently to reduce the load.
I'm not convinced I actually need Fabric here, but the value for me has been its the first time the company has been able to provision a platform that can handle the data at all. I have a small portion of it running into a datbase as well which has been constant complaints about volume.
At this point I can't tell if we just have unrealistic expectations about the costs of having this data that everyone wants, or if our data engineers are just completely out of touch with the current state of the industry, so Fabric is just the cost we have to pay to keep up.
I'm really not looking forward to a migration.
If you need to serve your dats across a network to many clients, managing that with SQLite is much trickier.
There are interesting use cases for DB-per-user which can be server or client side, or litestream's continuous backup/sync that can extend it beyond this use case a bit too.
You _can_ use SQLite as your service's sole database, if you vertically scale it up and the load isn't too much. It'll handle a reasonable amount of traffic. Once you hit that ceiling though, you'll have to rethink your architecture, and undergo some kind of migration.
The common argument for SQLite is deferring complexity of hosting until you've actually reached the type of load you have to use a more complex stack for.
The truth of the 2010s up until now is that every startup was a massive sales con job. The wealth of this industry is not truly built on incredible tech, but on the audacity of salesmanship. It's a billion-dollar con job. That's one of the reasons I take every ridiculous startup that launches quite seriously, because you have no idea just how audacious their sales people are. They can sell anything.
Your question is very fundamental, and the answer is just as raw and fundamental too. I would love it if some of these sales people actually reform and write tell-alls about how they conned so many large companies in their years of working. This content has got to be out there somewhere.
They can't build it themselves, and it's highly dubious that they'd be able to hire and supervise someone to build it. Databricks may be selling "nothing special", but it's needed, and the buyers can't build it themselves.