Scaling to users requires Synapse Pro(element.io)

15 pointsby gmemstr6 months ago5 comments

like_any_other6 months ago
> However, it should be obvious that if the traffic levels spiked higher by 3x to ~120 events/s - then 75% of the worker’s time would be spent simply listening to traffic
For a 4GHz CPU core, with 4 instructions-per-clock, that works out to 100 million instructions per 'event'. That is efficiency so farcically low I can only attribute it to self-sabotage.. so they can sell the Pro version.
- airhangerf156 months ago
  Yea, it's obviously a move away from open source. The FOSS tools are surely to turn to garbage. They killed of Dendrite (the re-implementation of the Matrix server in Go). People have made some clients, but I don't think there are any alternative implementations.
  This is sad. The Matrix protocol will no never be XMPP. I hope Element is happy, because they're never going to be Slack. They're now stuck in a position where they are mediocre garbage to all sides.
  Of course the post mentions government. Who else would buy an overpriced garbage product.
  - malwrar6 months ago
    I read this post as more of an “oh shit, idk who is running these projects but if you are reading this and developing a big deployment it’s going to be expensive and broken and make us all look bad. Please just spend your org’s $$$ for our rust rewrite that will make your graphs pretty and bosses smile”. The target audience feels pretty narrow for this post.
    I’m also wary of such FOSS -> fee transitions, but I think this one is benign. I’m hoping these folks obtain this funding.
    Arathorn6 months ago
    I wrote the OP, and this is very much the correct interpretation. I spelt out the problem more clearly here: https://mastodon.matrix.org/@element/113843808583376704
    The issue is that large system integrators offer to run huge Matrix deployments for governments, think they can do so by using the FOSS server and maintain it themselves, and have no incentive to route any $ to the upstream project at all. As a result, you end up with situations like https://www.heise.de/news/Probleme-mit-Open-Source-Videokonf... where the project fails, which makes everyone look bad.
    So the point of this (pretty brutal) post is to try to say: "Seriously, if you are trying to run millions of users on a deployment, work with us as the upstream project - out of the box, the FOSS project will not work for this use case".
- Arathorn6 months ago
  Nope, it's not self-sabotage - the worker is sitting there consuming the events from redis, and then pulling in the dependent data from the DB and populating and maintaining its caches (and marshalling & unmarshalling the necessary data) so that it can actually do work on the events as needed.
  Obviously one could certainly optimise it more... which is why we did, by rewriting it in Rust, and adding in smarter pubsub, etc.
  However, the key thing is that for anything other than huge deployments, this isn't a bottleneck. But for huge deployments, it becomes one. Meanwhile, we're hoping to use $ from Synapse Pro to fund perf improvements for normal FOSS Synapse and Matrix - e.g. algorithmically improving state resolution to be more performant; finishing faster room joins; improving federation traffic routing etc... so that smaller deployments get faster anyway without needing the faster workers.
traspler6 months ago
So there is the Synapse version that is required to get the big money contracts and the Synapse that is pretty inconsequential to the business except as a testing playground. - Makes it difficult to see how "Element is fully committed to community Synapse" can be true in the long run. Does this not spread the development effort & focus even thinner across these projects? And would reduced memory & cpu requirements not benefit all deployment sizes and not just nation-scale ones? It feels very much as a pivot towards a closed stack.
- Arathorn6 months ago
  It looks like i've done a bad job at explaining this, so i'll try to clarify:
  Making worker processes go fast has most benefit to enormous deployments, as they won't run out of headroom when running lots of single-core python workers.
  However, ALL deployment sizes benefit from algorithmic improvements to the protocol and its implementation - which are the cause of smaller servers being slower today.
  Specifically:
  * Merge conflict resolution (State resolution) is worse than O(N) complexity with the amount of state to be merged.
  * Incremental room joins (https://element-hq.github.io/synapse/latest/development/syna...) were never fully finished.
  * Servers burn lots of time trying to talk to dead servers: https://github.com/matrix-org/matrix-spec-proposals/pull/413...
  * All Matrix traffic currently runs full-mesh - there's no concept of "thin nodes" or delegating fan-out to a larger server.
  So, fixing these issues is all going into open source Synapse (and Matrix as a whole) - which should unrecognisably improve performance, whether servers are written in Python or Rust or Elixir or whatever. And the hope is that $ from Synapse Pro funds that work (assuming the gambit is successful).
  Meanwhile, all features, security work, perf optimisations (apart from scalability work), experimental MSCs etc will continue to land in FOSS Synapse for the forseeable.
  - traspler6 months ago
    After re-reading and reading many other comments in various posts I think I have misunderstood it partly. So the workers part has been overhauled in Rust for Synapse Pro. Is this a complete re-implementation of Synapse so the big deployments with Synapse Pro will not run anything from the normal Synapse or is this only part of the whole stack and the Pro deployments will also run most of the normal Synapse?
  - jdenning6 months ago
    But it seems like this move incentivizes you to not improve the open source server, at least such that the open source server is always inferior to the pro version.
    If someone makes a new, more performant, open-source server, and it touches your bottom line then you're strongly motivated to "embrace, extend, extinguish".
    The thing is, we've all heard this before, and it always ends up the same. I hope you prove me wrong, but I wouldn't bet on it.
    Arathorn6 months ago
    > But it seems like this move incentivizes you to not improve the open source server, at least such that the open source server is always inferior to the pro version.
    The idea is that we absolutely improve FOSS synapse in all ways - other than supporting enormous deployments. For instance we continue to land perf improvements to FOSS synapse and make average sized servers as snappy as conceivably possible. And all features land in FOSS synapse, etc. If we don’t it would harm the public Matrix network and we obviously don’t want that.
    > If someone makes a new, more performant, open-source server, and it touches your bottom line then you're strongly motivated to "embrace, extend, extinguish".
    Rather than EEE, I’d expect us to simply compete with that server - adding more features, better perf, better commercial support, etc. For Matrix’s sake, I hope that we end up in that situation tbh.
    > The thing is, we've all heard this before, and it always ends up the same. I hope you prove me wrong, but I wouldn't bet on it.
    I think the difference is that typically folks doing this are being greedy to grow a profitable (or could-be-profitable) company as aggressively as possible. Whereas here the motive is simply to pay for our FOSS dev and get to breakeven and be able to sustainably grow Matrix for the benefit of the whole network. If in the end a bit of proprietary software is the necessary evil to get there, sobeit.
    Of course this could change in future, eg if mgt changed, but that’s true of anything. But the intention is categorically not to EEE (and on the Matrix Foundation side, the governance and spec process is set up to stop Element from being able to EEE even if it wanted to).
jauntywundrkind6 months ago
InfluxDB 3 just went alpha, and they similarly have very severe limitations on what their open-core product will do, very strongly drive folks to upgrade to enterprise.
https://www.influxdata.com/blog/influxdb3-open-source-public... https://news.ycombinator.com/item?id=42684524 https://news.ycombinator.com/item?id=42703113
While I admit that I don't think losing open source developers is actually that big a harm to many projects (there's just not enough people out there to drive by big valuable amazing features), I feel like the open core approach shuts yourself off from most people who are looking for open source solutions. The core is not enough.
No one's going to be happy running a 500x slower python project knowing there's the real deal running elsewhere, with a hip new runtime they can't get.
I recognize that for some of these companies, this probably is a necessary move. They need revenue to do what they do and it's hard to get revenue in open source. But these are both interesting products that I was hopeful for that I can't imagine adopting anymore. That's fine, I don't demand being served by anyone, but it is really sad to see, and I wonder how many awesome projects that would have grown big stop these technologies will never be created, because of these shifts.
Matrix especially feels like a brutal loss, because we are so short of good communication systems. I regret not seeing DataFusion & Arrow being out to use & integrate on with InfluxDB 3 but at least there's a lots of time-series databases available. Matrix's whole ecosystem has been slowly slowly slowly building momentum & acceptance, but there's so much less diversity & offerings, & that now Synapse Pro is needed if you want more than a simple instance.
- pauldix6 months ago
  Our intention with InfluxDB Core is that it's useful to large audience. Just not the group of people seeking a historical TSDB. It's a collector, processor, and recent data TSDB. If you're familiar with the TICK stack from our 1.x line, it's like Telegraf (the data collector), Kapacitor (the processor and monitoring agent), and an InfluxDB that is better on the most recent data.
  The InfluxDB part of it is more narrowly scoped than previous versions, but the Telegraf and Kapacitor parts are much more feature rich than those previous products.
- Arathorn6 months ago
  woah there - hang on a sec
  > No one's going to be happy running a 500x slower python project knowing there's the real deal running elsewhere, with a hip new runtime they can't get.
  What if the current python project got 500x faster in general? As optimisation work for Synapse is not being paywalled - it’s just the worker scalability, which is not a bottleneck for normal sized servers anyway.
  The reason Matrix servers are typically slow today is that state resolution and storage is algorithmically slow; federation is fullmesh and doesn’t support “thin server” approaches for participating in busy rooms, and joining big rooms still blocks on loads of state being synchronised before you can see other members & history.
  Fixing this (and more) is very much on the menu for FOSS Synapse - and won’t be helped by faster workers, given workers are just for scalability, not for core performance. Conversely, $ from Synapse Pro will hopefully fund that work, which otherwise has been stuck for years now thanks to lack of $.
  (Also: if you did decide Element had gone mad and don’t want anything to do with Synapse, you can try a different homeserver like one of the Conduit forks; don’t throw Matrix under the bus with Element :)
amstan6 months ago
I wonder when they'll finally fix that 7 year old double notification bug.
- Arathorn6 months ago
  not aware of a double notif bug - do you have a github issue for it?
6 months ago
undefined