OTP itself has so much in it. We've been working on compiling Elixir to run on iOS devices. Not only can we do that through the release process but through using the ei library provided in Erlang we can compile a Node in C that will interface with any other Erlang node over a typical distributed network as you would for Erlang, Elixir, Gleam, etc... furthermore there is a rpc library in Erlang where from C we can make function calls and interface with our Elixir application. Yes, the encoding/decoding has an overhead and FFI would be faster but we're still way within our latency budget and we got this stood up in a few days without even have heard of it before.
The larger point here is that Erlang has been solving many of the problems that modern tech stacks are struggling with and it has solved for scale and implementation cost and it solved these problems decades ago. I know HN has a bit of a click-bait love relationship with Erlang/Elixir but it hasn't translated over to adoption and there are companies that are just burning money trying to do what you get out of the box for free with the Erlang stack.
I had gone in neutral about Nodejs, having never really used it much.
These projects I worked on were backend data pipeline that did not even process that much data. And yet somehow, it was incredibly difficult to isolate exactly the main bug. Along the way, I found out all sorts of things about Nodejs and when I compare it with Elixir/Erlang/OTP, I came to the conclusion that Node.js is unreliable by design.
Don't get me wrong. I've done a lot of Ruby work before, and I've messed with Python. Many current-generation language platforms are struggling with building reliable distributed systems, things that the BEAM VM and OTP platform had already figured out.
Personally I think much of it is due to async being predominant in Node and python. Async seems much harder than actor or even threading for debugging performance issues. Sure it feels easier to do async at first. But async leads to small bloat adding up and makes it very difficult to debug and track down. It makes profiling harder, etc.
In BEAM, every actor has its own queue. It's trivial to inspect and analyze performance blockages. Async by contrast puts everything into one giant processing queue. Plus every function call in async gets extra overhead added. It all adds up.
There's a counter-intuitive thing when trying to balance load across resources: applying resource limits helps the system run better overall.
One example: when scaling a web app, there comes a point when scaling up the database doesn't seem to help. So we're tempted to increase the connection pool because that looks like a bottleneck. Increasing the pool can make the overall system perform worse, because often times, it is slow queries and poorly performing queries that is stopping up the system.
Another example: one of the systems I worked on has over 250 node runtimes running on a single, large server. It used pm2 and did not apply cgroups to limit CPU resources. The whole system was a hog, and I temporarily fixed it by consolidating things to run on about 50 node runtimes.
When I moved them over to Kubernetes, I also applied CPU resource limit, each in its own pod. I set the limits based on what I measured when they were all running on PM2 ... but the same code running on Kubernetes ran with 10x less CPU overall. Why? Because the async code were not allowed to just run grabbing as much CPU as it can for as long as it can, and the kernel scheduler was able to fairly run. That allowed the entire system to run with less resources overall.
There's probably some math that folks who know Operations Research can prove all this.
As someone who has advocated against Kubernetes CPU limits everywhere I've worked, I'm really struggling to see how they helped you here. The code used 10x less CPU with CPU limits, with no adverse effects? What were all those CPU cycles going before?
Parent probably didn't care about latency.
Even when I distribute offsets on polling, I would get these seesaws where enough fights over resources.
The normal situation is that defective situations get a much large latency, while the correct requests run much faster.
It's a problem on the cases when the first set isn't actually defective. But it normally takes a reevaluation of the entire thing to solve those, and the non-limited situation isn't any good either.
How can you make performance claims while getting the details completely wrong?
Neither .NET's nor Rust's Tokio async implementations work this way. They use all available cores (unless overridden) and implement work-stealing threadpool. .NET in addition uses hill-climbing and cooperative blocking detection mechanism to quickly adapt to workloads and ensure optimal throughput. All that while spending 0.1x CPU on computation when compared to BEAM, and having much lower memory footprint. You cannot compare Erlang/Elixir with top of the line compiled languages.
On the other hand, I have yet to have to implement a liveness probe with an Elixir app, and I've had to do that with .NET because it can and does freeze. That game server also didn't use up all the available cores as well as the Elixir app. We also couldn't attach a REPL directly to the .NET app, though we certainly tried.
I would be curious to see if Rust works out better in production.
Sigh. I swear, the affliction of failing to understand the underlying concepts upon which a technology A or B is built is a plague upon our industry. Instead, everything clearly must fit into the concepts limited to whatever “mother tongue” language a particular developer has mastered.
Ironic, since any time you post about a programming language it's to inform that C# does it better.
Not just here; someone with your nick also whined when the creator of C# made a technical deficient decision when choosing Go over C# to implement typescript.
It's hard for a rational person to believe that someone would make the argument that the creator of the language must have made a mistake just because he reached for (in his words) a more appropriate language in that context.
You have a blind spot when it comes to C#. You also probably already know it.
You know you could have just linked the reply instead? It states "C#, F# or Rust". But that wouldn't sound that nice, would it? I use and enjoy multiple programming languages and it helps me in day-to-day tasks greatly. It does not prevent me from seeing how .NET has flaws, but holistically it is way less bad than most other options on the market, including Erlang, Go, C or what have you.
> It's hard for a rational person to believe that someone would make the argument that the creator of the language must have made a mistake just because he reached for (in his words) a more appropriate language in that context.
So appeal to authority trumps observable consequences, technical limitations and arguments made about lackluster technical vision at microsoft? Interesting. No, I think it is the kind of people who refuse to engage with the subject on their own merits that are a problem, relegating to the powers that be all the argumentation. Even in a team environment, sure it is easier to say "a team/person X makes a choice Y" but you could also, if the situation warrants it, expand on why you think this way, and if you can't maybe you shouldn't be making a statement?
So no, "TypeScript, including Anders Hejlsberg, choosing Go as the language to port TS compiler to" does not suddenly make pigs fly, if anything, but being seen as an endorsement from key C# figure is certainly a bad look.
Your argument is that you have a better grasp of "technical limitations" than Anders Hejlsberg?
You'll forgive the rest of us for not buying that; he has proven his chops, you haven't, especially as the argument (quite a thorough explanation of the context) from the typescript team is a lot more convincing than anything we've seen from you (a few nebulous phrases about technical superiority).
> but being seen as an endorsement from key C# figure is certainly a bad look.
Yeah, well, the team made their decision with no regard to optics. That lends more weight to their decision, not less.
Typescript is a huge success from Microsoft in terms of recapturing developers, without them knowing. MS is not a charity, look at how little love they give to F# compared to TS.
* My personal guess is that the age old MS instinct came into play: be coûte que coûte backwards compatible, port all the bugs, do not disturb anything.
* A second reason might be that TS people might not want to learn .net because of vibes. Do not underestimate vibes. Almost everyday on HN I see Python programs being posted where most often the creator would be better of if they had learned some next programming language. Decisions are seldomly made on a technical basis. We as humans decide emotionally, sometimes with rationalizations afterwards. And so, maybe Anders was rational in acknowledging the dev-social situation as is.
Whatever the reason, this will not be without consequences. The team now has to invest in GO and now depends on Google to take TS forward. And yes, this is also typical MS, one department can easily undo the other.
TLDR: the technical arguments were mostly nonsense, but the real arguments have likely more to do with age-old reflexes and dev-cultural issues.
Well that’s great. I didn’t mention Rust in that list because it does seem to perform well. Its async is also known as to be much more difficult to program.
> and having much lower memory footprint. You cannot compare Erlang/Elixir with top of the line compiled languages.
And yet I do and have. Despite all the cool tech for C# and .Net, I’ve seen simple C# web apps struggle to even run on Raspberry pi’s for IoT projects while Elixir ones run very well.
Also note Elixir is a compiled language and BEAM has JIT nowadays too.
I did hesitate to add C# to that list because it is an impressive language and can perform well. I also know the least about its async.
Nothing you said really counters that async as a general paradigm is more likely to lead to worse performance. It’s still more difficult to profile and tune than other techniques even with M:N schedulers. Look at the sibling post talking about resource allocation.
Even for Rust there was a HM post recently where they got a Rust service to run a fair bit faster than their initial Golang implementation. After months of extra work that is. They mentioned that Golang’s programming model made it much easier to write fairly performant networking code for. Since Go doesn’t use async it seems reasonable to assume go routines are easier to profile and track than async even if I lack knowledge of Go’s implementation details on the matter. Now I am assuming their Rust implementation used async but don’t know for sure.
Let's see it perform faster than Python first :)
Also, if the target is supported, .NET is going to unconditionally perform faster than Elixir. This is trivially provable.
> Nothing you said really counters that async as a general paradigm is more likely to lead to worse performance. It’s still more difficult to profile and tune than other techniques even with M:N schedulers. Look at the sibling post talking about resource allocation.
Can you provide any reference to support this claim as far as actually good implementations go? Because so far it looks like vibe-based reasoning with zero knowledge to substantiate the opinion presented as fact.
That's not surprising however - Erlang and Elixir as languages tend to leave their heavy users with big knowledge and understanding gaps and their communities are rather dogmatic about BEAM being the best next thing since sliced bread. Lack of critical thinking leads to such a sorry place.
Ah yes now to the No True Scotsman fallacy. Async only works well when it’s “properly implemented” which is only .NET.
Even some .NET folks prefer actors model for concurrent programming:
> Orleans is the most underrated technology out there. Not only does it power many Azure products and services, it is also the design basis for Microsoft Service Fabric actors, which also power many Azure products. Virtual actors are the perfect solution for today’s distributed systems.
> In my experience Orleans was able to handle insane write load (our storage/persistence provider went to a queue instead of direct, it was eventually consistent) so we were able to process millions of requests without breaking a sweat. Perhaps others would want more durability, we opted for this as the data was also in a time series database before Orleans saw it.
https://www.reddit.com/r/dotnet/comments/16kk2l1/comment/k0x...
https://learn.microsoft.com/en-us/dotnet/orleans/benefits
Ironically what got me into Elixir was learning about Orleans and how successful it was in scaling XBox services.
> Because so far it looks like vibe-based reasoning with zero knowledge to substantiate the opinion presented as fact.
Aside from personal experience and years of writing and deploying performance sensitive IoT apps?
Well quick googling shows quite a few posts detailing async issues:
> What tools and techniques might be suited for this kind of analysis? I took a quick glance at a flamegraph but it seems like I would need a relatively deep understanding of the async runtime internals since most of what I see looks like implementation details.
https://www.reddit.com/r/rust/comments/uph4tf/profiling_with...
> Reading a 1GB file in 100-byte chunks leads to at least 10,000,000 IOs through three async call layers. The problem becomes catastrophic since these functions are essentially language-level abstractions of callbacks, lacking optimizations that come with their async nature. However, we can manually implement optimizations to alleviate this issue.
https://www.ajanibilby.com/blog/async-js-performance-apr23/
> Asynchronous Rust seems to perform worst than multi-threaded Rust implementations.
https://dev.to/deepu105/concurrency-in-modern-programming-la...
> Under realistic conditions (see below) asynchronous web frameworks are slightly worse throughput (requests/second) and much worse latency variance.
https://calpaterson.com/async-python-is-not-faster.html
> I’m not going to say all async frameworks are definitely slower than threads. What I can say confidently is that asyncio isn’t faster, and it’s more efficient only for huge numbers of mostly idle connections. And only for that.
https://emptysqua.re/blog/why-should-async-get-all-the-love/
https://users.rust-lang.org/t/my-benchmark-done-elixir-is-fa...
https://blog.blackfire.io/the-challenges-of-async-python-obs...
Also picking asyncio from Python. Lol. You can't be serious, can you?
The only impression I get is most Elixir/Erlang practicioners simply have very ossified perception and deep biases that prevent them from evaluating implementation/design choices fairly and reaching balanced conclusions on where their capabilities lie. Very far cry from the link salad you posted that does not answer my question e.g. the issues with .NET and Rust async implementations performance-wise.
It's impossible to have a conversation with someone deeply committed to their bias and unwilling to accept that BEAM is not the shining paragon of concurrent and multi-threaded runtimes it once was.
I don't know about node but C# has async contexts you can use .
In other words, the runtime feature that Nodejs is the most proud of and markets to the world as its main advantage does not scale well in a reliable way.
The BEAM runtime has preemption and will degrade in performance much more gracefully. In most situations, because of preemption (and hot code reloading) you still have a chance for attaching a REPL to the live runtime while under load. That allows someone to understand the live environment and maybe even hot patch the live code until a the real fix can run through the continuous delivery system.
I'm not going to go into the bad Javascript syntax bloopers that still haunts us, and only partially mitigated by Typescript. That is documented in "Javascript: The Good Parts". Or how the "async" keyword colors function calls, forcing everything in a call chain to also be async, or forcing you to use the older callbacks. Most people I talk to who love Typescript don't consider those as issues.
The _main_ problems are:
1. Async threads can easily get orphaned in Nodejs. This doesn't happen when using OTP on BEAM because you typically start a gen_server (or a gen_*) under a supervisor. Even processes that are not supervised can be tracked. Because pids (identifiers to processes) are first-class primitives, you can always access the scheduler which will tell you _all_ of the running processes. If you were to attach a Nodejs REPL, you can't really tell. This is because there is no encapsulation of the process, no way to track when something went async, no way to send control messages to those async processes.
2. Because async threads are easily orphaned, errors that get thrown gets easily lost. The response I get from people who love Typescript on Nodejs tells me that is what the linter is for. That is, we're going to use an external tool to enforce all errors gets handled, rather than having the design of the language and the runtime handle the error. In the BEAM runtime, unhandled errors within the process crashes the process, without crashing anything else; processes that are monitoring that process that crashed gets notified by the runtime that it has crashed. The engineer can then define the logic for handling that crash (retry? restart? throw an error?).
3. The gen_server behavior in OTP defines ways to send control messages. This allows more nuanced approaches to managing subsystems than just restarting when things crash.
I'm pretty much at the point where I would not really want to work on deploying Nodejs on the backend. I don't see how something like Deno would fix anything. Typescript is incapable of fixing this, because these are design flaws in the runtime itself.
Spawning a lightweight process in BEAM returns a first-class primitive called a pid. That pid is recorded by the scheduler, so even if it gets lost by the code, you can still find out if it has been taking up resources (when debugging problems live in production).
Supervisor behavior is written in a way so that any gen_server-behavior-complying processes will be linked. That means any crashes of the spawned process will notify the supervisor. That’s not something we are doing with Nodejs async — there is no mailbox to notify, just either awaiting completion, or make sure you add the error handling … which is where people write linters to check.
Well, it's just a hack and some C libraries on top of a browser Javascript engine.
No big thought went into it, either before or after it got big.
Take GenServer. The workhorse of most BEAM systems. Everything it does it basically just calling various functions with simple parameters. So you can test it just by call l calling those functions and manually passing parameters to it, and asserting on its output. No need to set up complex testing systems that are capable of dealing with asynchronous code, no need to handle pauses and wait for coffee to finish running in your tests. It's something a lot of juniors tend to miss, but it's liberating when figured out
Do you or the community have a sense why that is?
> When you go too far up, abstraction-wise, you run out of oxygen. Sometimes smart thinkers just don’t know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don’t actually mean anything at all.
> These are the people I call Architecture Astronauts. It’s very hard to get them to write code or design programs, because they won’t stop thinking about Architecture. They’re astronauts because they are above the oxygen level, I don’t know how they’re breathing. They tend to work for really big companies that can afford to have lots of unproductive people with really advanced degrees that don’t contribute to the bottom line.
E.g. the built in telemetry system is fantastic, but when you are first adopting the stack it still takes a day or two to read the docs and get events flowing into - say - DataDog, which is roughly the same amount of time as basically every other solution.
The benefit of Elixir here is that the telemetry stack is very standardized across Elixir projects and libraries, and there are fewer moving pieces - no extra microservices or docker containers to ship with everything else. But that benefit comes 2 years down the line when you need to change the telemetry system.
The closest I've come across was trying to maintain an ejabberd cluster and add some custom extensions.
Between mnesia and the learning curve of the language itself, it was not fun.
There are also no popular syntax-alikes. There is no massive corporation pushing Erlang either directly or indirectly through success. Supposedly Erlang breeds success but it's referred to as a "secret" weapon because no one big is pushing it.
Erlang seems neat but it feels like you need to take a leap of faith and businesses are risk averse.
Isn't there this "small" company that has a chat app that is using erlang :P
But I guess it doesn't always work that way. FB chat was built on ejabberd and then migrated away.
Also, a lot of the power of Erlang is the OTP (Open Telecom Platform) even more than Erlang, itself. You have to internalize those architectural decisions (expect crashes--do fast restart) to get the full power of Erlang.
Elixir seems like it has been finding more traction by looking more like mainstream languages. In addition, languages on the BEAM (like Elixir) made the BEAM much better documented, understood and portable.
If we're talking pure modern-tech company - good luck bringing anything other than JS because "more developers == more growth" mentality.
So it's either end up being used where decision makers know/want-to-learn Erlang/Elixir or when all other possiblity was exhausted.
With Section 174 in play in the US, it tends to drive companies hiring specialists and attempting to use AI for the rest of it.
My own experience is that ... I don't really want to draw from the most plentiful and cheapest pool of workers. I've seen the kind of tech that produces. You basically have a small handful of software engineers carrying the rest.
Elixir itself is a kind of secret, unfair advantage for tech startups that uses it.
This is a thing I really don't get. People are like "but what about the hiring pool". A competent software engineer will learn your stack. It's not that hard to switch languages. Except maybe going from Python to C++.
We seemed to do pretty well, although some of our code/setup wasn't very idiomatic (for example, I'm pretty sure we didn't use the Erlang release feature properly at all)
Admittedly, I didn't have a whole company core product riding on my upgrades.
For otp updates, we would shutdown beam in an orderly fashion, replace the files, and start again. (Potentially installing the new one before shutting down, I can't remember).
Post facebook, more of boring OS packages and slow rollouts than hotloading.
And here we see someone claiming that lightweight processes and message passing aren’t the secret sauce, missing that Erlang as Communicating Sequential Processes is indivisible from those qualities, and then repeatedly mentioning CSP as part of the secret sauce.
Examples:
> The application programmer writes sequential code, all concurrency is hidden away in the behaviour;
> Easier for new team members to get started: business logic is sequential, similar structure that they might have seen before elsewhere;
> Supervisors and the “let it crash” philosophy, appear to produce reliable systems. Joe uses the Ericsson AXD301 telephone switch example again (p. 191):
Behaviors are interesting and solve a commonly encountered problem in the 80’s that was still being solved in some cases in the 00’s, but it’s a means as much as an end in Erlang. It’s how they implemented those other qualities. But I don’t know if they had to, to make Erlang still mostly be Erlang.
CSP is what inspired the golang channels, via occam and some other languages. The whole synchronization on unbuffered channels is the most obvious differentiator, though there are others like the actor concept of pattern matching over a mailbox.
The whole CSP vs actor debate is quite interesting when you get down to it because they superficially look kind of similar but are radically different in implications.
I've always thought the actor model made more sense, but highly YMMV.
I think the term Actor Model has been so semantically diluted at this point that the phrase also understates what Erlang has as well.
Neither CSP nor AM require process isolation to work, which means they can work when they work but fail much much worse. They are necessary but insufficient.
Aactors must always have a known address to be accessed and you share them by sharing addresses. You also wouldn't pass an actor to an actor and you'd pass an address instead. CSP channels are first-class. You can create anonymous channels and even pass channels through other channels. This is similar to languages with lambdas and first-class functions vs other languages where every function has a name and functions cannot be passed to other functions.
Actors are naturally async-only and (for example) make no attempt to solve the two generals problem while CSP implementations generally try to enforce synchronization. CSP also enforces message order while actors don't guarantee that messages will be received in the order they were sent.
These are all more theoretical than actual though. CSP channels may be anonymous to the programmer, but they all get process IDs just like Actors would. Actors may seem async, but they can (and no doubt do in practice) make stronger guarantees about message order and synchronicity when on the same CPU. Likewise, CSP would give the illusion of synchronicity and ordering across CPUs where none actually exists (just like TCP).
But yes, many solutions are isomorphic because they are dealing with the same information on the same Turing machine. That doesn’t mean it’s stupid to bring it up, but it can mean that there’s a less upside to switching solutions than people think.
The Chesterton’s Fence here though is that you can implement the Actor Model without the BEAM’s process isolation, and the supervisor tree that goes with it. If you insist on doing so, which several languages have, then the finer distinctions between CSP and Actor pale in comparison to the BEAM.
The project is about bring visual Flow Based Programming(FBP)[1] to Erlang. FBP seems to be made for Erlang and I was surprised there was something already but there does not seem to be.
My goto tool for FBP is Node-RED and hence the basic idea is to bolt a Node-RED frontend on to an Erlang backend and to have every node being a process. Node-REDs frontend is great for modelling message passing between nodes, hence there is a very simply one-to-one mapping to Erlangs processes and messages.
I've implemented some basics and started to create some unit tests as flows to slowly build up functionality. I would really like this to be 100% compatiable to Node-RED the NodeJS backend. For more details, the github repo --> https://github.com/gorenje/erlang-red
Overall Erlang is amazingly well suited to this and astonished that no one else has done anything like this - or have they?
Hopeful I can get some useful functionality together without hitting my Erlang coding limits!
Any help is greatly appreciated :+1:
The short answer seems to be that they pivoted to Java for new projects, which marginalized Erlang. Then Joe and colleagues formed Bluetail in 1998. They were bought by Nortel. Nortel was a telecom giant forming about a third of the value of the Toronto Stock Exchange. In 2000 Nortel's stock reached $125 per share, but by 2002 the stock had gone down to less than $1. This was all part of the dot com crash, and Nortel was hit particularly hard because of the dot com bubble burst corresponding with a big downturn in telecom spending.
It seems safe to look at Joe's layoff as more of a "his unit was the first to slip beneath the waves on a sinking ship" situation, as they laid off 60,000 employees or more than two thirds of their workforce. The layoff was not a sign that he may not have been pulling his weight. It was part of a big move of desperation not to be taken as a sign of the ineffectiveness of that business unit.
Erlang gives architects the tools to restart as little, or as much of the tree as they like, so I hope they have their brains fully engaged when working on the infrastructure that underlies their projects. For complex projects, it's vital think long and hard about state-interactions and sub-system dependencies, but the upside for Erlang is that this infrastructure is separated from sequential code via behaviors, and if the organization is big enough, the behaviors will be owned by a dedicated infrastructure team (or person) and consumed by product teams, with clear demarcations of responsibilities.
> but only Erlang/BEAM made it a first-class concept in a production system.
Exceptions?
The BEAM runtime and all languages that target it including Erlang do not allow mutation, (ETS and company excepted). This means that on the BEAM runtime you can not only roll back the stack but you can also rollback the state safely. This is part of what the poster meant by the most granular level possible.
It is a well thought out and trued system of computation that has a consistency rarely witnessed in other languages, much less the “web”. It is not perfect. But it is pretty impressive.
Unfortunately, I find the appreciation and uptake for what simplicity empowers in the software world pretty under appreciated. Complexity allows people to become specialists, managers to have big teams and lots of meetings, experts to stay experts.
Erlang was being developed in a period where companies were trying to implement software solutions with smaller headcounts, limited horsepower, etc. A multi decade outpouring of cash into the domain has made the value of “less will mean more for all of us in good ways” less of an attractor.
It appears to me that erlang does this.
I accept that others, such as yourself, have different opinions.
I think the reason that I feel the way I do is that I’ve read a lot over the years (like, non fiction), so I’ve been heavily conditioned to look for infix characters to separate and define the shape of code, and strings of words to express things. But that may be self introspective reach too far.
Apache Mesos[1] is the only thing that comes to my mind as a similar platform to BEAM in its ability to treat multi-machine resources as a single pool.
Over a year ago, my private consulting company decided to adopt Erlang as our backend language. After some time, we started exploring BEAM's internals to, for example, replace the TCP-based stack with QUIC and integrate some Rust patches. A truly fantastic choice for lightweight and high-throughput systems that are only failing in case of kernel panic or power loss. We are currently working on very "busy", concurrent software like a film/game production tracker and pipeline manager, and are now also preparing R&D for a private hospital management services.
For Cowboy, what's wrong with the docs? Erlang translates to Elixir pretty cleanly (list comprehensions and records notwithstanding): prefix atoms with `:`, downcase variables, `%` maps, and `~c""` Erlang strings. If you're really itchy (as in the look of using atoms as modules makes you itchy), you can alias the Erlang modules as Elixir module atoms: `alias :cowboy_router, as: CowboyRouter`.
And using Postgres via Ecto is a bad thing, or? What was your point or a complaint?
Because the function signatures of Erlang's behaviors are critically tied to Erlang's other functionality, specifically its unusual use of immutability. You need a separate init call for its servers because of that, and a very distinct use of the state management to work exactly the same way.
But to achieve the same goals in other languages, you almost always shouldn't directly copy what Erlang is doing. In fact when I see "Look! I ported gen_server into $SOME_OTHER_LANGUAGE" and I see exactly and precisely the exact interface Erlang has, I know that the port doesn't deeply understand what Erlang is doing.
When I ported the idea of supervisor trees into Go [1], I did so idiomatically. It turns out in modern Go the correct interface for "a thing that can be supervised" is not precisely the same signature that Erlang has, but
type Service interface {
Serve(context.Context)
}
That's all you need and all you should use... in Go. Your other language may vary. Go doesn't need a "handle_event/2" because it has channels, and you should use those, not because they are "better" or "worse" but because that's what this language does. In another language you may use something else. In another infrastructure you may end up sending things over Kafka or some cloud event bus rather than "calling a handle_event/2". The key is in building an event-based system, not copying the exact implementation Erlang has.A peculiar issue the Erlang community has is getting excessively convinced that there's something super-mega-special about the exact way Erlang does it, and that if you do it any other way it is ipso facto wrong and therefore not reliable. This may have been true in 2005; it is not true in 2025. Where once Erlang had almost the only sensible answer, in 2025 the problem is poking through the ocean of answers deluging us! While I recommend learning from Erlang about reliable software, I strongly recommend against just blind-porting out the exact way Erlang achieves it into any other language. It is in almost any other language context the wrong answer. Even other immutable languages generally vary enough that they can't just copy the same structure.
This to me is the most interesting question about Erlang, and I say this as someone who works professionally in Elixir.
It's _clear_ that there is incredible appetite for tools that help us design reliable concurrent systems given the wild success of things like k8s, Kafka, AWS's distributed systems products, etc., but why hasn't Erlang/Elixir been able to capture that share?
My friends and I debate this all the time, but I don't know the answer.
Likewise, most devs don't want to learn an obscure language for one job even if they are more than capable. Either they get stuck doing that language or they earn a hole in their resume instead of additional experience in what future employers care about.
Finally, the vast majority of applications and systems don't need ultra high reliability and don't have the budget for it. It isn't clear that downtime impedes success for anything but the most critical businesses.
It is a matter of someone having the same 2 years of experience over and over again, or someone learning many things. Personally I would welcome a chance to learn more Erlang on the job and build something with it.
Unfortunately, businesses want the fresh graduate with 10y of work experience, who already knows their complete stack. Maybe not so much in the Erlang world, but in general. Learning on the job?? Pah! You already ought to know! Just another reason to pay less!
And Erlang jobs are rare. I am between jobs, so if someone happens to know a remote job, where I could start working and learn more Erlang (have only looked at the beginning of "Learn you some Erlang for great Good"), please let me know. I would be happy to have that "hole" as part of my CV :D
Sure, agreed, but you aren't even going to get to the point of an engineer recognising this because you'll fail the gauntlet of HR with it's tick-boxes for tech stacks.
You could be extremely battle-hardened on fault-tolerant distributed systems from being the the Erlang trenches for the last 3 years, but because they HR person couldn't tick-off one of "Node.js", "Java", "C#/.Net" or "Python", your application won't ever be seen by an engineer.
This is less of an issue with accumulated experience. Personally I would actually welcome the kind of job that would involve learning a new niche language, since I already have >10 years of experience in several mainstream languages, and there's diminishing returns wrt resumes and interviews past this point.
Because Erlang has a well-integrated collection of what are by 2025 standards mediocre tools.
There is value to that integration, and I absolutely won't deny that.
However, the state of the art has moved beyond Erlang in a number of ways, and you're taking a pretty big penalty to stick to BEAM on a number of fronts now. Its performance is sub-par, and if you're running a large cluster, that's actually going to matter. Erlang qua Erlang I'd call a subpar language, and Elixir qua Elixir is merely competitive; there are many places to get similar capabilities, with a wide variety of other available cost/benefit choices. Erlang's message bus is not terribly resilient itself; modern message busses can be resilient against individual nodes in the message bus going down, and it's a powerful pattern to have multiple consumers against a single queue, which Erlang's focus on PIDs tends to inhibit. Erlang's message bus is 0-or-1 when as near as I can tell the rest of the world has decided, correctly IMHO, that 1-or-n is superior. Erlang is fairly insular; once you have to hook up one non-BEAM service to the system, well, you're going to do that over some sort of message bus or something, and you pretty quickly get to the point that you might as well let that be your core architecture rather than the BEAM cluster. Once you're heterogeneous, and BEAM is just another node on the net, there isn't necessarily a lot of reason to stay there. And as a system scales up, the pull to heterogeneity approaches infinity; takes a lot of work to go to an entire company and force them to work entirely in BEAM.
Plus, some of the problems Erlang solved in one way have developed better solutions. Erlang solves the problem of multiple code bases possibly simultaneously existing in the same cluster by basically making everything untyped. That was a nifty solution for the 1990s, but today I think we've gotten a lot better at having typed data structures that still retain backwards compatibility if necessary. So throwing away the entire type system, including all the methods and inheritance or composition or whatever, to solve that problem is a heck of a blow.
I do want to close out with a repetition of the fact that there is value in that solid integration. More people today are aware of the various tools like "message busses", but it is still clearly not as common knowledge as I'd like and I still see entire teams struggling along basically crafting an ad-hoc half-specified custom message bus every so often, which in 2025 is insane. (I have written a couple of services where I have basically had to provide HTTP "REST" endpoints that end up just being proxies on to my internal message bus that my system is really based on, because they'd rather POST HTTP than have to use a message bus library, even though it doesn't really buy them anything.) Erlang does help educate people about what are now the basics of cloud architecture. And that "well-integrated collection of mediocre tools" can still solve a lot of problems. Many sins can be forgiven by a 32 4GHz cores backed by high powered RAM, disk, and networking.
But it would take a lot of backwards-incompatible changes to create a BEAM 2.0 that would be competitive on all fronts... if indeed such a thing is even possible. The variety of techs exist for a reason. It stinks to have to paw through them sometimes, but the upside is you'll often find the exact right solution for your needs.
Becasuse Erlang is a runtime + language and Kubernetes is a neutral platform. You can build concurrent and reliable solution without the need of locking you down to a single language.
Someone can start by just porting its Python code on Kubernetes to make it more reliable and fault tolerent.
> Go doesn't need a "handle_event/2" because it has channels, and you should use those
Of what type? But most importantly, channels are local to the process, so you need glue to make it networked. (I assume erlang has networked message handling abstracted away). In addition I’ve seen 3-4 different variations of your proposed pattern for long-running server like things.
I agree fully that porting should make use of idiomatic constructs. But I also think languages can have hidden mechanics that loses the valuable essence while porting – a form of anti-relativism of PLs if you will.
It’s entirely possible to me that this ”oh a channel? just wrap it in X” is much more detrimental to interop than what it sounds like. For instance take http.Handler in Go. Similarly simple but what are the real world implications of having it in std? An ecosystem of middleware that is largely compatible with one another, without pre-coordination (a non-std http server X can be used with auth middleware Y and logging middleware Z). Similar things can be said about io.Reader and friends. These extremely simply interfaces are arguably more valuable than the implementations.
If, and I’m speculating here, Erlang got many of the interfaces for reliable distributed systems right, that can be what enables the whole.
Of the type of the messages you're sending. Which can either be an interface for multiple messages, or you can use multiple channels with one type each. I've done both. This is not an important question when actually programming in Go.
"But most importantly, channels are local to the process, so you need glue to make it networked."
This is an important consideration if you are using Go. Although I would observe that it isn't so much that "channels don't do network" as that "channels are a local tool"; e.g., we do not complain that OS mutexes are not "network capable", because they're intrinsically local. Network locking uses different solutions, and we don't really consider etcd a competitor to a local "lock" call.
But there are dozens of message busses in the world now, and Erlang's isn't really all that competitive modulo its integration.
What Joe did in his thesis is to show you how you can build reliable systems (and up point, reliable distribuited systems) by using a given set of Lego blocks.
The reason why you need the erlang vm to implement something like that appropriately – and that you cannot do that fully on a different VM – is that without the underlying plumbing, supervision trees would be leaky - in Java, you cannot kill a thread that is holding up to resources and hope that everything will always go well, And do not have ways to monitor different processes.
The BEAM succeeds because you can run 1M processes on a single node, represent complex distributed state machines with ease, and restart portions of the system with zero downtime. Among many other things.
I really don't think behaviors/interfaces is the most critical piece.
That's kind of how Erlang is. At first, anything Erlang has, some other system has too:
Isolated process heaps? - Just use OS processes
Supervision trees? - Use kubernetes.
Message passing? - Not a big deal, I can write two threads and a shared queue in Java.
Hot code loading? - Java can do that too
Low latency processing? - I can tune my LMAX disruptor to kick Erlang's butt any day.
Now getting all that into one platform or library that's the main idea. OS processes are heavyweight. Running 2M of them on a server is not easy. You could use some green threads or promises but now you lost the isolated heap bit.
You can use kubernetes to some degree but it does not do nested supervision trees well. I guess it would work, but now you have your code, and you have pods and controllers, and volumes and all the shit.
You can do message passing with an "actors" libraries in many language. But you cannot do pattern matching on receive, and it doesn't transparently integrate with sending it across nodes to another thread.
You can do hot code loading, but how do you deal with runtime data structure and state. Erlang is built around that: gen_servers since the state is immutable and explicit has callbacks to upgrade not just the code but the state itself.
OS procs are heavyweight. Erlang procs are ~2KB each. You can spin up millions on one box while traditional OS procs would melt your machine. Not even in the same league.
> Supervision trees? - Use kubernetes.
Comparing Kubernetes to Erlang supervision trees misses the mark. K8s is infrastructure that requires significant configuration and maintenance outside your codebase. Erlang's supervision is just code - it's part of your application logic. And those nested supervision trees? Good luck implementing that cleanly in K8s without a ton of custom work.
> Message passing? - Not a big deal, I can write two threads and a shared queue in Java.
Basic threads and queues don't compare to Erlang's sophisticated message passing. Pattern matching on receive makes for cleaner, more maintainable code, and the transparent distribution across nodes comes standard. Building equivalent functionality from scratch would require significantly more code and infrastructure.
"Any sufficiently complicated concurrent program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Erlang."
> Hot code loading? - Java can do that too
Java can reload classes, but Erlang was designed for seamless evolution of running systems. The gen_server callbacks for upgrading state during hot code loading show how deeply this concept is integrated into Erlang's core.
I would not say that Beam eliminates this problem in any way, but I do think it lowers the slope of the line. The self-consistent idioms and functionality, especially with deployment, auto recovery and load balancing, reduce the inter-module friction. It makes a system where 12 engineers can easily manage 30 endpoints, and your surface area can still follow a power law.
Rather, what is interesting about the BEAM is that throwing an error is so graceful that it's not such a sin to just throw an error. In otherwords, a component that CAN error or get into a weird state can be shoved into a behaviour that CANNOT. And by default you are safe from certain operational errors becoming logic or business errors.
For example. You might have a defined "get" interface that doesn't return an error -- let's say it starts as an in-memory K/V store and it returns an optional(value), which is NULL in the case that the key didn't exist.
But suppose you want to have two datastores that the same interface targets, so you might abstract that to a filesystem, and you could have a permission error. And returning "NULL" is not actually "correct". You should throw, because that bubbles up the error to ops teams instead of swallowing it whole. A panic in this case is probably fine.
What if now you're going over a filesystem that's over the network, and the line to the datacenter was backhoe'd and there was a 10 millisecond failover by your SDN -- returning "NULL" is really not correct, because consumers of your getter are liable to have a bad time managing real consistency business cases that could cost $$$. And in this case a panic is not necessarily great, because you bring down everything over a minor hiccup.
The other power with throwing errors + behaviors is that it makes trapping errors with contextual information reporting (e.g. a user-bound 500 error with stack trace information sent somewhere where ops can take a gander) really easy and generically composable, that's not so for error monads or panics.
Anyways it was always strange to me that erlang-inspired actor system programming languages came out that obsessed over "never having errors" as a principle (like ponylang) because that's throwing out a big part of erlang.
I have worked with people that had deployed huge amounts on the BEAM that had a real problem with the answer to that, and resort to magical thinking.
When erlang processes "crash", assuming the whole system didn't crash, they almost certainly alerted a monitoring process of the fact, so that a process can be quickly restarted. This is the core of how supervision trees in erlang are built.
There are a lot of subtleties to that. The whole system may or may not be a single BEAM instance, and if more than one then they can be distributed, i.e. processes on one machine receive failure messages from processes on others, and can restart the processes elsewhere. These mechanisms on a practical basis are sufficient to automatically pick up the majority of transient failures. (I should add there are two classic ways to blow up a BEAM instance which make this less good than it should be: a bad C function call "NIF" for native something function, or posting messages to a process faster than it can consume them, which will eventually cause an OOM).
But this differs from the underlying philosophy of the runtime, which is that things are only done when they're done, and you should expect failures at any time. This maps on to their messaging paradigm.
What you actually sound like you want is a universe more like FoundationDB and QuiCK https://www.foundationdb.org/files/QuiCK.pdf where the DB and worker queue all live in one single transactional space, which certainly makes reasoning about a lot of these things easier, but have nothing to do with erlang.
> you might not even know from the outside if it has been fully, partially, or not at all processed yet
Erlang does not propose a unique solution to distributed problems, just good primitives.
So the answer would be the same; you'd keep track in the queue if the element was partially popped, but not completed, and you report back to the queue that the processing failed and that the element should be fully put back.
So in Erlang you might monitor a worker process and requeue items handled by processes that failed.
If you want that state to be durable, you need to store it durably. Mnesia provides (optional) distributed transactions which may be appropriate for durability needs (lots of details). Or you could externalize durability to other systems.
Erlang is wonderful, but it's not magic. It won't prevent hardware failures, so if an Erlang process fetches something from a queue and the cpu stops for whatever reason, you've got a tricky situation. Erlang does offer a way for a process to monitor other processes, including processes on remote nodes, so your process will be notified if the other process crashes or if the other node is disconnected; but if the other node is disconnected, you don't know what happened to the other process --- maybe it's still running and there's a connectivity issue, maybe the whole host OS crashed. You could perhaps set bidirectional monitors, and then know that the remote process would be notified of the disconnection as well, if it still was running... but you wouldn't know if the process finished (sucessfully or not) after the connectivity failed but before the failure was detected and processed.
The erlang process state will be simply what it has on the stack. (Ignoring things like ETS tables for the moment).
Erlang has the concept of ports, used to interface to the world outside, that provide a sort of hook for cleanup in the event of a crash. Ports belong to processes, in the event of a crash all associated ports are cleaned up. You can also set this sort of thing up between purely erlang processes as well.
As the other commenter observed, erlang gives you the primitives to make distributed systems work; it does not prescribe solutions, especially around distributed transactions, which imo is one of the reasons some of the hype around the BEAM is misguided.
There's nothing outright stopping you from doing proper design and building separate erlang services that exchange state with regular protocols, but there does seem to be a temptation to just put all erlang in one big monolith and then run into very hard memory and scaling issues when usage and data grows.
One high profile erlang user in the payment industry was mainly constrained by how big a server they could buy, as all their code ran on a single server with a hot standby. They have since moved to java, and rethought how they managed shared state
Facebook managed to get ejabberd, the xmpp server written in erlang, to back their first Messenger, but it involved sharding to give each ejabberd-instance a small enough data set to cope, and a clever way to replicate presence data outside of erlang (storing it in compact memory blocks on each ejabberd server, and shipping them wholesale to a presence service at a regular cadence).
Pretty soon they tore ejabberd out, metaphorically burned it in a field and salted the earth... but how much of that was the fault of erlang itself, and how much it was the issue of having one corner with erlang in a largely C++ world isn't known to me.
OTP ships with mnesia_frag which allows fragmenting a logical table into many smaller tables. You don't need to have all of the tables on all of the nodes that share an mnesia schema. That's at least one way to scale mnesia beyond what fits in memory on a single node. Single nodes are pretty big though; we were running 512GB mnesia nodes 10 years ago on commodity hardware, and GCP says 32TB is available. You can do a lot within a limit of 32TB per node.
There's other ways to shard too, at WhatsApp pre-FB, our pattern was to run mnesia schemas with 4 nodes where one half of the nodes were in service, the other was in our standby colo, all nodes had all the tables in this schema, and requests would be sharded so each schema group would only serve 1/N users and each of the two active nodes in a schema group would get half of the requests (except during failure/maintenance). We found 4 node schemas were easiest to operate, and ensuring that in normal operations, a single node (and in most cases, a single worker process) would touch specific data made us comfortable running our data operations in the async_dirty context that avoids locking.
We did have scaling challenges (many of which you can watch old Erlang Factory presentations about), but it was all surmountable, and many of the things would be easier today given improvements to BEAM and improvements in available servers.
I'm not sure I understand the question - all queue systems I've used separate delivery and acknowledgement, so if a process crashes during processing the messages will be redelivered once it restarts.
Do you have a concrete example of a flow you're curious about?
Maybe these could help:
- https://ferd.ca/the-zen-of-erlang.html
- https://jlouisramblings.blogspot.com/2010/11/on-erlang-state...
Bwahahaha. Reminds me of the JRuby team, who left Sun as a single unit and resumed work as a team at another company (I can't remember where) when Oracle acquired Sun.
Haskell taught me a lot about programming, things that I still use now, even though I only write Python.
Does learning erlang teach you a new way of thinking? Or does it just make you wish you had erlang language features and libraries when not writing erlang?
False statement. Ericsson still uses Erlang, for example in their MME. Source: I used to work at Ericsson.
Just when we thought everything was going well, in 1998, Erlang was banned within Ericsson Radio AB (ERA) for new product development. This ban was the second most significant event in the history of Erlang: It led indirectly to Open Source Erlang and was the main reason why Erlang started spreading outside Ericsson.
The reason given for the ban was as follows:
The selection of an implementation language implies a more long-term commitment than the selection of a processor and OS, due to the longer life cycle of implemented products. Use of a proprietary language implies a continued effort to maintain and further develop the support and the development environment. It further implies that we cannot easily benefit from, and find synergy with, the evolution following the large scale deployment of globally used languages. [26] quoted in [12].
In addition, projects that were already using Erlang were allowed to continue but had to make a plan as to how dependence upon Erlang could be eliminated. Although the ban was only within ERA, the damage was done. The ban was supported by the Ericsson technical directorate and flying the Erlang flag was thereafter not favored by middle management."
And to be completely fair....
"6.2 Erlang in recent times
In the aftermath of the IT boom, several small companies formed during the boom have survived, and Erlang has successfully rerooted itself outside Ericsson. The ban at Ericsson has not succeeded in completely killing the language, but it has limited its growth into new product areas.
The plans within Ericsson to wean existing projects off Erlang did not materialise and Erlang is slowly winning ground due to a form of software Darwinism. Erlang projects are being delivered on time and within budget, and the managers of the Erlang projects are reluctant to make any changes to functioning and tested software.
The usual survival strategy within Ericsson during this time period was to call Erlang something else. Erlang had been banned but OTP hadn’t. So for a while no new projects using Erlang were started, but it was OK to use OTP. Then questions about OTP were asked: “Isn’t OTP just a load of Erlang libraries?”—and so it became “Engine,” and so on."
A History of Erlang Joe Armstrong Ericsson AB
©2007 ACM 978-1-59593-766-7/2007/06-ART6
https://lfe.io/papers/%5B2007%5D%20Armstrong%20-%20HOPL%20II...
There's probably a discussion on precisely what this means, but such descriptions as "Erlang is banned" has significant and credible precedent.
Around year 2008 being an Erlang coder was often more or less seen as being a COBOL coder in Sweden. Bluetail had sort of failed, having burned lots of VC, iirc.
So Erlang was something weird and custom that Ericsson used to build software for legacy phone exchanges. I remember that a colleague's wife working at Ericsson had received on-the-job training from essentially zero programming knowledge to become an Erlang developer in order to maintain some phone exchange software.
It's been fascinating to see it morph into something cool. Whatsapp, etc.
If anything, it fell out of favour and lost hype wave for some time after that, while other languages copied aspects of the Actor model... and mostly the BEAM hype came back in the form of Elixir.
Source: I work at WhatsApp
https://lfe.io/papers/%5B2007%5D%20Armstrong%20-%20HOPL%20II...
You can write actually synchronous code in Erlang and the runtime makes it so that no process blocks any other process by preempting them on a schedule.
In Go you manage your goroutines and channels explicitly, while the BEAM runs all processes for you, and I've seen Robert Virding run an infinite loop in one Erlang process while the rest were serving requests, the core with the loop stayed at 100% but 0 requests were dropped and the latency and throughput was more or less the same, pretty crazy capabilities.
You can do the same in Go but it's a lot more manual.
Always thought the key differentiator is having a kind of "orchestrator/supervisor" for those processes.
(Which "lightweight processes and message passing" facilitates, sure, but it's more than those)
BEAM is great, although it's definitely missing something like pprof for go or java flight recorder.
We built the system in Java and C. The distribution layer was done completely in Java. It was only after the system was done that I discovered Erlang. I REALLY wish I had known about it earlier. Erlang solved so many of the problems we had to solve by ourselves.
I also felt it had strong "halt and catch fire on error" properties.
Am I maligning it, and Erlang?
Erlang's not about lightweight processes and message passing - https://news.ycombinator.com/item?id=34545061 - Jan 2023 (274 comments)
When you choose Erlang for a project, what kind of return on investment do you think it typically offers? Does it lead to significant cost savings or help generate more revenue in ways that other languages might not?
In situations where Erlang is chosen, what are some concrete examples of how it has demonstrably increased efficiency, reduced errors, or enabled new business opportunities that wouldn't have been as feasible with other technologies?
Edit: I guess if I'd done any research myself before asking, I might've found this: https://www.erlang-solutions.com/blog/which-companies-are-us...
With the latter I get a huge ecosystem of packages and wide compatibility with platforms and tooling and also a robust and scalable actor model.
Learning Erlang or any related language meanwhile feels like learning Tolkien’s Elvish for the purposes of international trade.
I can come back in 5 years to explain to you what is annoying about Akka.NET compared to the BEAM and vice versa. An expert in the BEAM who lacks experience in C# is not going to be able explain to an expert in C# who lacks experience in the BEAM why BEAM is better.
You're asking for something incredibly rare - a person who is an expert in both runtimes and can concisely explain to you the tradeoffs of each.
Otherwise, aside from educational purposes, they are not worth spending your time on. Just skip to F# over Elixir because Elixir is not a serious language, lacking base language primitives and operations one would expect standard library to offer. It's not productive nor fast.
That's not necessarily an indictment on the language itself. The alternative would have been to keep using it while also open sourcing it, but I'm guessing they just wanted to be able to hire cheaper C developers or whatever the flavor of the time was.
> In February 1998, Ericsson Radio Systems banned the in-house use of Erlang for new products, citing a preference for non-proprietary languages.[15] The ban caused Armstrong and others to make plans to leave Ericsson.[16] In March 1998 Ericsson announced the AXD301 switch,[8] containing over a million lines of Erlang and reported to achieve a high availability of nine "9"s.[17] In December 1998, the implementation of Erlang was open-sourced and most of the Erlang team resigned to form a new company, Bluetail AB.[8] Ericsson eventually relaxed the ban and re-hired Armstrong in 2004.
- edit, poster was quoting a quote in the article, not wikipedia, the article is the one omitting the context