Compare with a simple pipeline in bash:
grep needle < haystack.txt | sed 's/foo/bar/g' | xargs wc -l
Each of those components executes in parallel, with the intermediate results streaming between them. You get a similar effect with coroutines.Compare Ruby:
data = File.readlines("haystack.txt")
.map(&:strip)
.grep(/needle/)
.map { |i| i.gsub('foo', 'bar') }
.map { |i| File.readlines(i).count }
In that case, each line is processed sequentially, with a complete array being created between each step. Nothing actually gets pipelined.Despite being clean and readable, I don't tend to do it any more, because it's harder to debug. More often these days, I write things like this:
data = File.readlines("haystack.txt")
data = data.map(&:strip)
data = data.grep(/needle/)
data = data.map { |i| i.gsub('foo', 'bar') }
data = data.map { |i| File.readlines(i).count }
It's ugly, but you know what? I can set a breakpoint anywhere and inspect the intermediate states without having to edit the script in prod. Sometimes ugly and boring is better.The inventor of the shell pipeline, Douglas McIlroy, always understood the equivalency between pipelines and coroutines; it was deliberate. See https://www.cs.dartmouth.edu/~doug/sieve/sieve.pdf It goes even deeper than it appears, too. The way pipes were originally implemented in the Unix kernel was when the pipe buffer was filled[1] by the writer the kernel continued execution directly in the blocked reader process without bouncing through the scheduler. Effectively, arguably literally, coroutines; one process call the write function and execution continues with a read call returning the data.
Interestingly, Solaris Doors operate the same way by design--no bouncing through the scheduler--unlike pipes today where long ago I think most Unix kernels moved away from direct execution switching to better support multiple readers, etc.
[1] Or even on the first write? I'd have to double-check the source again.
Something like
lines = File.readlines("haystack.txt")
stripped_lines = lines.map(&:strip)
needle_lines = stripped_lines.grep(/needle/)
transformed_lines = needle_lines.map { |line| line.gsub('foo', 'bar') }
line_counts = transformed_lines.map { |file_path| File.readlines(file_path).count }
is a hell to read and understand later imo. You have to read a lot of intermediate variables that do not matter in anything else in the code after you set it up, but you do not know in advance necessarily which matter and which don't unless you read and understand all of it. Also, it pollutes your workspace with too much stuff, so while this makes it easier to debug, it makes it also harder to read some time after. Moreover becomes even more crumpy if you need to repeat code. You probably need to define a function block then, which moves the crumpiness there.What I do now is starting defining the transformation in each step as a pure function, and chain them after once everything works, plus enclosing it into an error handler so that I depend on breakpoint debugging less.
There is certainly a trade off, but as a codebase grows larger and deals with more cases where the same code needs to be applied, the benefits of a concise yet expressive notation shows.
Eh, at first glance it looks "amateurish" due to all the repeated stuff. Chaining explicitly eliminates redundant operations - a more minimal representation of data flow - so it looks more "professional". But I also know better than to act on that impulse. ;)
That said, it really depends on the language at play. Some will compile all the repetition of `data =` away such that the variable's memory isn't re-written until after the last operation in that list; it'll hang out in a register or on the stack somewhere. Others will run the code exactly as written, bouncing data between the heap, stack, and registers - inefficiencies and all.
IMO, a comment like "We wind up debugging this a lot, please keep this syntax" would go a long way to help the next engineer. Assuming that the actual processing dwarfs the overhead present in this section, it would be even better to add discrete exception handling and post-conditions to make it more robust.
Bit annoying, but serviceable. Though there's nothing wrong with your approach either.
https://gist.github.com/user-attachments/assets/3329d736-70f...
Allow me, too, to disagree. I think the right term is "function composition".
Instead of writing
h(g(f(x)))
as a way to say "first apply f to x, after which g is applied to the result of this, after which h is applied to the result of this", we can use function composition to compose f, g and h, and then "stuff" the value x into this "pipeline of composed functions".We can use whatever syntax we want for that, but I like Elm syntax which would look like:
x |> f >> g >> h
You make an interesting point about debugging which is something I have also encountered in practice. There is an interesting tension here which I am unsure about how to best resolve.
In PRQL we use the pipelining approach by using the output of the last step as the implicit last argument of the next step. In M Lang (MS Power BI/Power Query), which is quite similar in many ways, they use second approach in that each step has to be named. This is very useful for debugging as you point out but also a lot more verbose and can be tedious. I like both but prefer the ergonomics of PRQL for interactive work.
Update: Actually, PRQL has a decent answer to this. Say you have a query like:
from invoices
filter total > 1_000
derive invoice_age = @2025-04-23 - invoice_date
filter invoice_age > 3months
and you want to figure out why the result set is empty. You can pipe the results into an intermediate reference like so: from invoices
filter total > 1_000
into tmp
from tmp
derive invoice_age = @2025-04-23 - invoice_date
filter invoice_age > 3months
So, good ergonomics on the happy path and a simple enough workaround when you need it. You can try these out in the PRQL Playground btw: https://prql-lang.org/playground/I believe the correct definition for this concept is the Thrush combinator[0]. In some ML-based languages[1], such as F#, the |> operator is defined[2] for same:
[1..10] |> List.map (fun i -> i + 1)
Other functional languages have libraries which also provide this operator, such as the Scala Mouse[3] project.0 - https://leanpub.com/combinators/read#leanpub-auto-the-thrush
1 - https://en.wikipedia.org/wiki/ML_(programming_language)
2 - https://fsharpforfunandprofit.com/posts/defining-functions/
Unless I misunderstood the author, because method chaining is super common where I feel thrush operators are pretty rare, I would be surprised if they meant the latter.
I get the impression (though I haven't checked) that the thrush operator is a backport of OOP-style method chaining to functional languages that don't support dot-method notation.
Incidentally, have you ever considered investing in real estate? I happen to own an interest in a lovely bridge which, for personal reasons, I must suddenly sell at a below-market price.
data = File.readlines("haystack.txt")
data = data.map(&:strip)
data = data.grep(/needle/)
data = data.map { |i| i.gsub('foo', 'bar') }
data = data.map { |i| File.readlines(i).count }
Hard disagree. It's less readable, the intend is unclear (where does it end?), and the variables are rewritten on every step and everything is named "data" (and please don't call them data_1, data_2, ...) so now you have to run a debugger to figure out what even is going on, rather than just... reading the code.I myself agree, and find myself doing that too, especially in frontend code that executes in a browser. Debuggability is much more important than marginally-better readability, for production code.
I find this take surprising. I guess it depends on how much weight you give to "marginally-better", but IMHO readability is the single most important factor when it comes to writing code in most code-bases. You write code once, it may need to be debugged (by yourself or others) on rare occasions. However anytime anyone needs to understand the code (to update it, debug it, or just make changes in adjacent code) they will have to read it. In a shared code-base your code will be read many more times than it will be updated/debugged.
const foo = something()
.hoge()
.hige()
.hage();
better, sure, but not actually significantly harder to read than: let foo = something();
foo = foo.hoge();
foo = foo.hige();
foo = foo.hage();
But, while reading is more common than debugging, debugging a production app is often more important. I guess I am mostly thinking about web apps, because that is the area where I have mainly found the available debuggers lacking. Although they are getting better, I believe, I've frequently seen problems where they can't debug into some standard language feature because it's implemented in C++ native code, or they just don't expose the implicit temporary variables in a useful way.(I also often see similar-ish problems in languages where the debuggers just aren't that advanced, due to lack of popularity, or whatever.)
Particularly with web apps, though, we often want to attach to the current production app for initial debugging instead of modifying the app and running it locally, usually because somebody has reported a bug that happens in production (but how to reproduce it locally is not yet clear).
Alternatively stated, I guess, I believe readability is important, and maybe the "second most important thing", but nevertheless we should not prefer fancy/elegant code that feels nice to us to write and read, but makes debugging more difficult (with the prevailing debuggers) in any significant way.
In an ideal world, a difference like the above wouldn't be harder to debug, in which case I would also prefer the first version.
(And probably in the real world, the problems would be with async functions less conducive to the pithy hypothetical example. I'm a stalwart opponent of libraries like RxJs for the sole reason that you pay back with interest all of the gains you realized during development, the first time you have to debug something weird.)
Processes run in parallel, but they process the data in a strict sequential order: «grep» must produce a chunk of data before «sed» can proceed, and «sed» must produce another chunk of data before «xargs» can do its part. «xargs» in no way can ever pick up the output of «grep» and bypass the «sed» step. If the preceding step is busy crunching the data and is not producing the data, the subsequent step will be blocked (the process will fall asleep). So it is both, a pipeline and a chain.
It is actually a directed data flow graph.
Also, if you replace «haystack.txt» with a /dev/haystack, i.e.
grep needle < /dev/haystack | sed 's/foo/bar/g' | xargs wc -l
and /dev/haystack is waiting on the device it is attached to to yield a new chunk of data, all of the three, «grep», «sed» and «xargs» will block.Given File.readlines("haystack.txt"), the entire file must be resident in memory before .grep(/needle/) is performed, which may cause unnecessary utilization. Iirc, in frameworks like Polars, the collect() chain ending method tells the compiler that the previous methods will be performed as a stream and thus not require pulling the entirety into memory in order to perform an operation on a subset of the corpus.
I've only ever heard the term 'pipelining' in reference to GPUs, or as an abstract umbrella term for moving data around.
Other than that I think both styles are fine.
Another problem of having different names for each step is that you can no longer quickly comment out a single step to try things out, which you can if you either have the pipeline or a single variable name.
Java streams are the closest equivalent, both by the concurrent execution model, and syntactically. And yes, the Java debugger can show you the state of the intermediate streams.
Iterators are not (necessarily) concurrent. I believe you mean lazily.
That is, iterators' execution flow is interspersed, with the `yield` statement explicitly giving control to another coroutine, and then continuing the current coroutine at another yield point, like the call to next(). This is very similar to JS coroutines implemented via promises, with `await` yielding control.
Even though there is only one thread of execution, the parts of the pipeline execute together in lockstep, not sequentially, so there's no need for a previous part to completely compute a large list before the following part can start iterating over it.
the chaining really only works if your language is strongly typed and you are somewhat guaranteed that variables will be of expected type.
But, yes, naive call chaining like that is sometimes a significant performance problem in the real world. For example, in the land of JavaScript. One of the more egregious examples I've personally seen was a Bash script that used Bash arrays rather than pipelines, though in that case it had to do with the loss of concurrency, not data churn.
For my Ruby example, each of those method calls will allocate an Array on the heap, where it will persist until all references are removed and the GC runs again. The extra overhead of the named reference is somewhere between Tiny and Zero, depending on your interpreter. No extra copies are made; it's just a reference.
In most compiled languages: the overhead is exactly zero. At runtime, nothing even knows it's called "data" unless you have debug symbols.
If these are going to be large arrays and you actually care about memory usage, you wouldn't write the code the way I did. You might use lazy enumerators, or just flatten it out into a simple procedure; either of those would process one line at a time, discarding all the intermediate results as it goes.
Also, "File.readlines(i).count" is an atrocity of wasted memory. If you care about efficiency at all, that's the first part to go. :)
This helped me quickly develop a sense for how code is optimized and what code is eventually executed.
I do it in a similar way you mentioned
But with actually checked in code, the tradeoff in readability is pretty substantial
However.
I would be lying if I didn't secretly wish that all languages adopted the `|>` syntax from Elixir.
```
params
|> Map.get("user")
|> create_user()
|> notify_admin()
```
https://github.com/tc39/proposal-pipeline-operator
I'm very excited for it.
Our only hope is if TypeScript finally gives up on the broken TC39 process and starts to implement its own syntax enhancements again.
[1] https://2024.stateofjs.com/en-US/usage/#top_currently_missin...
More specifically, with the (also ironically gummed up in tc39) type syntax [1], and importantly node introducing the --strip-types option [2], TS is only ever going to look more and more like standards compliant JS.
It seems like most people are just asking for the simple function piping everyone expects from the |> syntax, but that doesn't look likely to happen.
I'm not a JS dev so idk what member property support is.
"bar"
|> await getFuture(%);
|> baz(await %);
|> bat(await %);
My guess is the TC committee would want this to be more seamless.This also gets weird because if the `|>` is a special function that sends in a magic `%` parameter, it'd have to be context sensitive to whether or not an `async` thing happens within the bounds. Whether or not it does will determine if the subsequent pipes are dealing with a future of % or just % directly.
In reality it would look like:
"bar"
|> await getFuture()
|> baz()
|> await bat()
(assuming getFuture and bat are both async). You do need |> to be aware of the case where the await keyword is present, but that's about it. The above would effectively transform to: await bat(baz(await getFuture("bar")));
I don't see the problem with this. "bar"
|> await getFuture()
How would you disambiguate it from your intended meaning and the below: "bar"
|> await getFutureAsyncFactory()
Basically, an async function that returns a function which is intended to be the pipeline processor.Typically in JS you do this with parens like so:
(await getFutureAsyncFactory())("input")
But the use of parens doesn't transpose to the pipeline setting well IMO
Given this example:
(await getFutureAsyncFactory("bar"))("input")
the getFutureAsyncFactory function is async, but the function it returns is not (or it may be and we just don't await it). Basically, using |> like you stated above doesn't do what you want. If you wanted the same semantics, you would have to do something like: ("bar" |> await getFutureAsyncFactory())("input")
to invoke the returned function.The whole pipeline takes on the value of the last function specified.
a |> await f()
and a |> (await f())
Might be expected to do the same thing.But the latter is syntactically undistinguishable from
a |> await returnsF()
What do you think about a |> f |> g
Where you don't really call the function with () in the pipeline syntax? I think that would be more natural. a |> (await f())()
which removes any sort of ambiguity. Your first example calls f() with a as its first argument while the second (after my fix) calls and awaits f() and then invokes that result with a as its first argument.For the last example, it would look like:
a |> (await f())() | g()
assuming f() is still async and returns a function. g() must be a function, so the parenthesis have to be added. chute(7) // setup a chute and give it a seed value
.toString // call methods of the current data (parens optional)
.parseInt // send the current data through global native Fns
.do(x=>[x]) // through a chain of one or more local / inline Fns
.JSON.stringify // through nested global functions (native / custom)
.JSON.parse
.do(x=>x[0])
.log // through built in Chute methods
.add_one // global custom Fns (e.g. const add_one=x=>x+1)
() // end a chute with '()' and get the result
[1] https://chute.pages.dev/ | https://github.com/gregabbott/chuteThey list this as a con of F# (also Elixir) pipes:
value |> x=> x.foo()
The insistence on an arrow function is pure hallucination value |> x.foo()
Should be perfectly achievable as it is in these other languages. What’s more, doing so removes all of the handwringing about await. And I’m frankly at a loss why you would want to put yield in the middle of one of these chains instead of after.I haven't looked at the member properties bits but I suspect the pipeline syntax just needs the transform to be supported in build tools, rather than adding yet another polyfill.
``` params.get("user") |> create_user |> notify_admin ```
Even more concise and it doesn't even require a special language feature, it's just regular syntax of the language ( |> is a method like .get(...) so you could even write `params.get("user").|>(create_user) if you wanted to)
Also, what if the function you want to use is returned by some nullary function? You couldn't just do |> getfunc(), as presumably the pipeline operator will interfere with the usual meaning of the parentheses and will try to pass something to getfunc. Would |> ( getfunc() ) work? This is the kind of problem that can arise when one language feature is permitted to change the ordinary behaviour of an existing feature in the name of convenience. (Unless of course I'm just missing something.)
I just find this syntax too inconsistent and vague, and hence actually annoying. Which is why I prefer defining pipes as composition of functions which can then be applied to whatever data. Then eg one can write sth like `(|> foo1 foo2 (foo3) #(foo4 % y))` and know that foo1 and foo2 are references to functions, foo3 evaluates to another function, and when one needs more arguments in foo4 they have to explicitly state that. This gives another function, and there is no ambiguity here whatsoever.
def main_loop(%Game{} = game) do
game
|> get_move()
|> play_move()
|> win_check()
|> end_turn()
end
instead of the much harder to read: def main_loop(%Game{} = game)
end_turn(win_check(play_move(get_move(game))))
end
For an example with multiple parameters, this pipeline: schema
|> order_by(^constraint)
|> Repo.all()
|> Repo.preload(preload_opts)
would be identical to this: Repo.preload(Repo.all(order_by(schema, ^constraint)), preload_opts)
To address your question above,> if `foo(y)` actually gives you a function to apply to `x` prob you should write `x |> foo(y)()`
If foo(y) returned a function, then to call it with x, you would have to write foo(y).(x) or x |> foo(y).(), so the syntax around calling the anonymous function isn't affected by the pipe. Also, you're not generally going to be using pipelines with functions that return functions so much as with functions that return data which is then consumed as the first argument by the next function in the pipeline. See my previous comment on this thread for more on that point.
There's no inconsistency or ambiguity in the pipeline operator's behavior. It's just syntactic sugar that's handy for making your code easier to read.
That's actually true. In Scala that is not so nice, because then it becomes `x |> foo(_, arg2)` or, even worse, `x |> (param => foo(param, arg2))`. I have a few such cases in my sourcecode and I really don't like it. Haskell and PureScript do a much better job keeping the code clean in such cases.
I agree with that and it confused me that it looks like the function is not referenced but actually applied/executed.
In Elixir, it is just a macro so it applies to all functions. I'm only a Scala novice so I'm not sure how it would work there.
Yes exactly, which is why it is not equivalent. No macro needed here. In Scala 2 syntax:
``` implicit class AnyOps[A](private val a: A) extends AnyVal { def |>[B](f: A => B) = f(a) } ```
This is usually the Thrush combinator[0], exists in other languages as well, and can be informally defined as:
f(g(x)) = g(x) |> f
0 - https://leanpub.com/combinators/read#leanpub-auto-the-thrush x |> f(y) = f(x, y)
As a result, the Elixir variant cannot be defined as a well-typed function, but must be a macro.The problem is that method-chaining is common in several OO languages, including Ruby. This means the functions on an object return an object, which can then call other functions on itself. In contrast, the pipe operator calls a function, passing in what's on the left side of it as the first argument. In order to work properly, this means you'll need functions that take the data as the first argument and return the same shape to return, whether that's a list, a map, a string or a struct, etc.
When you add a pipe operator to an OO language where method-chaining is common, you'll start getting two different types of APIs and it ends up messier than if you'd just stuck with chaining method calls. I much prefer passing immutable data into a pipeline of functions as Elixir does it, but I'd pick method chaining over a mix of method chaining and pipelines.
Note also that it works well in Elixir because it was created at the same time as most of the standard library. That means that the standard library takes the relevant argument in the first position all the time. Very rarely do you need to pipe into the second argument (and you need a lambda or convenience function to make that work).
Is there any language with a single feature that gives the best of both worlds?
```
params
|> Map.get("user")
|> create_user()
|> (¬ify_admin("signup", &1)).() ```
or
```
params
|> Map.get("user")
|> create_user()
|> (fn user -> notify_admin("signup", user) end).() ```
params
|> Map.get("user")
|> create_user()
|> then(¬ify_admin("signup", &1))
params
|> Map.get("user")
|> create_user()
|> then(fn user -> notify_admin("signup", user) end)
[0] https://hexdocs.pm/elixir/1.18.3/Kernel.html#then/2 "World"
|> then(&concat("Hello ", &1))
I imagine a shorter syntax could someday be possible, where some special placeholder expression could be used, ex: "World"
|> concat("Hello ", &1)
However that creates a new problem: If the implicit-first-argument form is still permitted (foo() instead of foo(&1)) then it becomes confusing which function-arity is being called. A human could easily fail to notice the absence or presence of the special placeholder on some lines, and invoke the wrong thing.My dislike does improve my test coverage though, since I tend to pop out a real method instead.
flip :: (x -> y -> z) -> (y -> x -> x)
flip f = \y -> \x -> f x y
x |> (flip f)(y) -- f(x, y)
(No disagreements with your post, just want to give credit where it's due. I'm also a big fan of the syntax)
users
& map validate
& catMaybes
& mapM persist
In my programming language, I added `.>` as a reverse-compose operator, so pipelines of function compositions can also be read uniformly left-to-right, e.g.
process = map validate .> catMaybes .> mapM persist
There is also https://hackage.haskell.org/package/flow which uses .> and <. for function composition.
EDIT: in no way do I want to claim the originality of these things in Elm or the Haskell package inspired by it. AFAIK |> came from F# but it could be miles earlier.
users
|> map validate
|> catMaybes
|> mapM persist
Instead of:
```
fetch_data()
|> (fn
{:ok, val, _meta} -> val
:error -> "default value"
end).()|> String.upcase()
```
Something like this:
```
fetch_data()
|>? {:ok, val, _meta} -> val
|>? :error -> "default value"
|> String.upcase()
```
fetch_data()
|> case do
{:ok, val, _meta} -> val
:error -> "default value"
end
You have the extra "case do...end" block but it's pretty close?This is for sequential conditions. If you have nested conditions, check out a where block instead. https://dev.to/martinthenth/using-elixirs-with-statement-5e3...
For example, we can write: (foo (bar (baz x))) as (-> x baz bar foo)
If there are additional arguments, we can accommodate those too: (sin (* x pi) as (-> x (* pi) sin)
Where expression so far gets inserted as the first argument to any form. If you want it inserted as the last argument, you can use ->> instead:
(filter positive? (map sin x)) as (->> x (map sin) (filter positive?))
You can also get full control of where to place the previous expression using as->.
Full details at https://clojure.org/guides/threading_macros
I use these with xforms transducers.
That's sort of an argument for the existence of macros as a whole, you can't really do this as neatly in something like python (although I've tried) - I can see the downside of working in a codebase with hundreds of these kind of custom language features though.
One day, we'll (re)discover that partial application is actually incredibly useful for writing programs and (non-Haskell) languages will start with it as the primitive for composing programs instead of finding out that it would be nice later, and bolting on a restricted subset of the feature.
Recently I started using Nushell, which feels very similar.
And for exceptions, why not solve it in the data model, and reify failures? Push it further downstream, let your pipeline's nodes handle "monadic" result values.
Point being, it's always a tradeoff, but you can usually lessen the pain more than you think.
And that's without mentioning that a lot of "pipelining" is pure sugar over the same code we're already writing.
Exception handing is only a problem in languages that use exceptions. Fortunately there are many modern alternatives in wide use that don't use exceptions.
I've encountered and used this pattern in Python, Ruby, Haskell, Rust, C#, and maybe some other languages. It often feels nice to write, but reading can easily become difficult -- especially in Haskell where obscure operators can contain a lot of magic.
Debugging them interactively can be equally problematic, depending on the tooling. I'd argue, it's commonly harder to debug a pipeline than the equivalent imperative code and, that in the best case it's equally hard.
Programming should be focused on the happy path. Much of the syntax in primitive languages concerning exceptions and other early returns is pure noise.
The old adage of not writing code so smart you can’t debug it applies here.
Pipelining runs contrary enough to standard imperative patterns. You don’t just need a new mindset to write code this way. You need to think differently about how you structure your code overall and you need different tools.
That’s not to say that doing things a different way isn’t great, but it does come with baggage that you need to be in a position to carry.
If you need to handle an unhappy path in a way that isn’t optimal for nested function calls then you shouldn’t be nesting your function calls. Pipelining doesn’t magically make things easier nor harder in that regard.
But if a particular sequence of function calls do suit nesting, then pipelining makes the code much more readable because you’re not mixing right-to-left syntax (function nests) with left-to-right syntax (ie you’re typical language syntax).
Crudely put, in C-like languages, pipelining is just as way of turning
fn(fn(fn()))
Where the first function call is in the inner, right-most, parentheses,into this:
fn | fn | fn
…which can be easily read sequentially from left-to-right.What you’re looking at is loops defined inside lambda functions. Pipelining makes it much easier to use anonymous functions and lambdas. But it doesn’t magically solve the problem of complex loops.
It does not solve it magically, but it does give the programmer options to coalesce different paradigms into one single working implementation.
The crux of the “magic” with pipelining is chaining functions. How those functions are composed and what they do is a separate topic entirely.
To me it is awkward to describe but simple to understand. Lucky me I have no intention of describing it.
That’s why I used scare quotes around the term ;)
> To me it is awkward to describe but simple to understand.
It’s not awkward to describe though. It’s literally just syntactic sugar for chaining functions.
It’s probably one of the easiest programming concepts to describe.
From our conversation, I’m not sure you do understand it because you keep bringing other tangential topics into the fold. Granted I don’t think the article does a great job at explaining what pipelining is, but then I think it’s point was more to demonstrate cool syntactic tricks you can pull off when writing functions as a pipeline.
edit: just realised you aren't the same person who wrote the original comment claiming pipelining was about iteration. Apologies for getting you mixed together.
data.iter()
.filter(|w| w.alive)
.map(|w| w.id)
.collect()
is one loop, as opposed to
collect(map(filter(iter(data), |w| w.alive), |w| w.id),
which is three loops.
Did you notice four letters 'i', 't', 'e' and 'r' in the code, followed by two round brackets? They mean "iterator".
`Iter` is a method of `data`. And do you know what a method is? It's a function attached to an object. A FUNCTION. Pipelining is just syntactic sugar around chaining functions.
You even proved my point when you quoted the article:
collect(map(filter(iter(data), |w| w.alive), |w| w.id))
Literally the only thing changing is the syntax of the code. You've got all of the same functions being called, with the same parameters and in the same order.The article itself makes no mention of this affecting how the code is executed either. Instead, it talks about code readability.
In fact the article further proves my point when it says:
> You can, of course, just assign the result of every filter and map call to a helper variable, and I will (begrudgingly) acknowledge that that works, and is significantly better than trying to do absurd levels of nesting.
What it means by this is something like the following:
list = iter(data)
list = map(channel, |w| w.toWingding())
list = filter(list, |w| w.alive)
list = map(list, |w| w.id)
result = collect(list)
While I do have some experience in this field (having written a pipeline-orientated programming language from scratch), I'll cite some other sources too, so it's not just my word against yours:+ Wikipedia: https://en.wikipedia.org/wiki/Pipeline_(computing) (no mention of iteration, just chaining functions and processes)
+ JavaScript proposal: https://www.geeksforgeeks.org/javascript-pipeline-operator/ (it's very clear how pipelining works in this guide)
+ Pipeline macros in LISP: https://blog.fugue88.ws/archives/2022-03/Pipelines-in-Lisp (again, literally just talking about cleaner syntax for nested functions)
The reason the article focuses on map/reduce type functions is because it's a common idiom for nesting commands. In fact you'll be familiar with this in Bash:
cat largefile.txt | sort | uniq --count
(before you argue about "useless use of `cat`" and other optimisations that could be made, this is just an example to demonstrate my point).In here, each command is a process but analogous to a function in general-purpose programming languages like Rust, LISP, Javascript, etc. Those UNIX processes might internally loop through the contents of STDIN as a LF-delimited list but that happens transparently to the pipeline. Bash, when piping each command to the next, doesn't know how each process will internally operate. And likewise, in general-purpose programming language world, pipelines in LISP, Rust, JavaScript (et al) don't know nor care how each function behaves internally with it's passed parameters just so long as the output data type is compatible with the data type of the next function -- and if it isn't, then that's an error in the code (ie compile-time error in Rust or runtime error in Javascript).
So to summerise, pipelining has nothing to do with iteration. It's just syntactic sugar to make nested functions easier to read. And if the examples seem to focus on map/reduce, it's just because that's a common set of functions you'd want to chain and which are particularly ugly to read in nested form. ie they're an example of functions called in a pipeline, not the reason pipelines exist nor proof that pipelines themselves have any internal logic around iteration.
Pipelines are about iteration, of course. And they do have internal logic around iteration.
cat largefile.txt | sort | uniq --count
is an excellent example. While cat and count iterate on each character sequentially, sort and uniq require buffering and allocating additional structures.
Iteration is about looping and if the pipeline in the above example was some kind of secret source for iteration then the commands above would fork multiple times, but they don’t.
Rust chains everything because of this. It's often unpleasant (see: all the Rust GUI toolkits).
Pipelining can become hard to debug when chains get very long. The author doesn't address how hard it can be to identify which step in a long chain caused an error.
They do make fun of Python, however. But don't say much about why they don't like it other than showing a low-res photo of a rock with a pipe routed around it.
Ambiguity about what constitutes "pipelining" is the real issue here. The definition keeps shifting throughout the article. Is it method chaining? Operator overloading? First-class functions? The author uses examples that function very differently.
Yeah, I agree that this can be problem when you lean heavily into monadic handling (i.e. you have fallible operations and then pipe the error or null all the way through, losing the information of where it came from).
But that doesn't have much to do with the article: You have the same problem with non-pipelined functional code. (And in either case, I think that it's not that big of a problem in practice.)
> The author uses examples that function very differently.
Yeah, this is addressed in one of the later sections. Imo, having a unified word for such a convenience feature (no matter how it's implemented) is better than thinking of these features as completely separate.
It's not like you lose that much readability from
foo(bar(baz(c)))
c |> baz |> bar |> foo
c.baz().bar().foo()
t = c.baz()
t = t.bar()
t = t.foo()
fn get_ids(data: Vec<Widget>) -> Vec<Id> {
data.iter()
.filter(|w| w.alive)
.map(|w| w.id)
.collect()
}
It sounds to me like you're asking for linebreaks. Chaining doesn't seem to be the issue here.But you could in many cases easily infer from the execution plan what a query would look like and fetch an intermediate set separately.
(-> c baz bar foo)
But people usually put it on separate lines: (-> c
baz
bar
foo)
Just before that statement, he says that it is an article/hot take about syntax. He acknowledges your point.
So I think when he says "semantics beat syntax every day of the week", that's him acknowledging that while he prefers certain syntax, it may not be the best for a given situation.
>Let me make it very clear: This is [not an] article it's a hot take about syntax. In practice, semantics beat syntax every day of the week. In other words, don’t take it too seriously.
Building pipelines:
https://effect.website/docs/getting-started/building-pipelin...
Using generators:
https://effect.website/docs/getting-started/using-generators...
Having both options is great (at the beginning effect had only pipe-based pipelines), after years of writing effect I'm convinced that most of the time you'd rather write and read imperative code than pipelines which definitely have their place in code bases.
In fact most of the community, at large, converged at using imperative-style generators over pipelines and having onboarded many devs and having seen many long-time pipeliners converging to classical imperative control flow seems to confirm both debugging and maintenance seem easier.
No longer do we have to explain that expressions are evaluated in the order of FROM -> JOIN -> ON -> SELECT -> WHERE -> GROUP BY -> HAVING -> ORDER BY -> LIMIT (and yes, I know I'm missing several other steps). We can simply just express how our data flows from one statement to the next.
(I'm also stating this as someone who has yet to play around with the pipelining syntax, but honestly anything is better than the status quo.)
It is an exemplar of expressions [0] more than anything else, which have little to do with the idea of passing results from one method to another.
[0]: https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...
fn get_ids(data: Vec<Widget>) -> Vec<Id> { data.iter() // get iterator over elements of the list .filter(|w| w.alive) // use lambda to ignore tombstoned widgets .map(|w| w.id) // extract ids from widgets .collect() // assemble iterator into data structure (Vec) }
Same thing in 15 year old C# code.
List<Guid> GetIds(List<Widget> data)
{
return data
.Where(w => w.IsAlive())
.Select(w => w.Id)
.ToList();
}
In this case I would say extension methods are what he's really referring to, of which Linq to objects is built on top of.
1) The method chaining extension methods on IEnumerable<T> like Select, Where, GroupBy, etc. This is identical to the rust example in the article.
2) The weird / bad (in my opinion) language keywords analogous to the above such as "from", "where", "select" etc.
[1]: https://learn.microsoft.com/en-us/dotnet/csharp/linq/get-sta...
a.b(c) == AType.b(a, c) (or AType::b(a, c) , C++ style)
auto get_ids(std::span<const Widget> data)
{
return data
| filter(&Widget::alive)
| transform(&Widget::id)
| to<std::vector>();
}
auto get_ids(std::span<const Widget> data)
{
auto pipeline = filter(&Widget::alive) | transform(&Widget::id);
auto sink = to<std::vector>();
return data | pipeline | sink;
}
I'm really want to start playing with some C++23 in the future.
Point-free style and pipelining were meant for each other. https://en.m.wikipedia.org/wiki/Tacit_programming
In fact I tried to make some similar points in my CMU "SQL or Death" Seminar Series talk on PRQL (https://db.cs.cmu.edu/events/sql-death-prql-pipelined-relati...) in that I would love to see PRQL (or something like it) become a universal DSL for data pipelines. Ideally this wouldn't even have to go through some query engine and could just do some (byte)codegen for your target language.
P.S. Since you mentioned the Google Pipe Syntax HYTRADBOI 2025 talk, I just want to throw out that I also have a 10 min version for the impatient: https://www.hytradboi.com/2025/deafce13-67ac-40fd-ac4b-175d5... That's just a PRQL overview though. The Universal Data Pipeline DSL ideas and comparison to LINQ, F#, ... are only in the CMU talk. I also go a bit into imperative vs declarative and point out that since "pipelining" is just function composition it should really be "functional" rather than imperative or declarative (which also came up in this thread).
f.g = f(g(x))
Based on this, I think a reverse polish type of notation would be a lot better. Though perhaps it is a lot nicer to think of "the sine of an angle" than "angle sine-ed".Not that it matters much, the switching costs are immense. Getting people able to teach it would be impossible, and collaboration with people taught in the other system would be horrible. I am doubtful I could make the switch, even if I wanted.
data.iter()
.filter(|w| w.alive)
.map(|w| w.id)
.collect()
collect(map(filter(iter(data), |w| w.alive), |w| w.id))
The second approach is open for extension - it allows you to write new functions on old datatypes.> Quick challenge for the curious Rustacean, can you explain why we cannot rewrite the above code like this, even if we import all of the symbols?
Probably for lack of
> weird operators like <$>, <*>, $, or >>=
Examples:
https://kotlinlang.org/docs/extensions.html
https://docs.scala-lang.org/scala3/reference/contextual/exte...
See also: https://en.wikipedia.org/wiki/Uniform_function_call_syntax
I wrote a little pipeline macro in https://nim-lang.org/ for Advent of Code years ago and as far as I know it worked okay.
``` import macros
macro `|>`\* (left, right : expr): expr =
result = newNimNode(nnkCall)
case right.kind
of nnkCall:
result.add(right[0])
result.add(left)
for i in 1..<right.len:
result.add(right[i])
else:
error("Unsupported node type")
```Makes me want to go write more nim.
fun main() {
val s: String? = null
println(s.isS()) // false
}
fun String?.isS() = "s" == this
And you'd lose all those cases of extension methods where the convenience of accepting null left of the dot is their sole reason to be. Null is a valid state, not something incredibly scary best dealt with with a full reboot or better yet throwing away the container. Kotlin is about making peace with null, instead of pretending that null does not exist. (yes, I'm looking at you, Scala)
What I do agree with is that extension methods should be a last ditch solution. I'd actually like to see a way to do the nullable receiver thing defined more like regular functions. Perhaps something like
fun? name() = if (this==null) "(Absent)" else this.name
that is defined inside the regular class block, imported like a regular method (as part of the class) and even present in the class object e.g. for reflection on the non-null case (and for Java compat where that still matters)> Null is a valid state, not something incredibly scary best dealt with with a full reboot or better yet throwing away the container. Kotlin is about making peace with null, instead of pretending that null does not exist. (yes, I'm looking at you, Scala)
I honestly find this to be such a weird thing to say or imply. No one is "scared" of null.
Agreed. It should be a first-class construct in a language with its own own proper type,
Null null;
rather than needing to hitch a ride with the Integers and Strings like a second-class construct.I prefer to just generalize the function (make it generic, leverage traits/typeclasses) tbh.
> Probably for lack of > weird operators like <$>, <*>, $, or >>=
Nope btw. I mean, maybe? I don't know Haskell well enough to say. The answer that I was looking for here is a specific Rust idiosyncrasy. It doesn't allow you to import `std::iter::Iterator::collect` on its own. It's an associated function, and needs to be qualified. (So you need to write `Iterator::collect` at the very least.)
You probably noticed, but it should become a thing in RFC 3591: https://github.com/rust-lang/rust/issues/134691
So it does kind of work on current nightly:
#![feature(import_trait_associated_functions)]
use std::iter::Iterator::{filter, map, collect};
fn get_ids2(data: Vec<Widget>) -> Vec<Id> {
collect(map(filter(Vec::into_iter(data), |w| w.alive), |w| w.id))
}
fn get_ids3(data: impl Iterator<Item = Widget>) -> Vec<Id> {
collect(map(filter(data, |w| w.alive), |w| w.id))
}
https://dspace.mit.edu/handle/1721.1/6035
https://dspace.mit.edu/handle/1721.1/6031
https://dapperdrake.neocities.org/faster-loops-javascript.ht...
In fact, I always thought it would be a good idea for all statement blocks (in any given programming language) to allow an implicit reference to the value of the previous statement. The pipeline operation would essentially be the existing semicolons (in a C-like language) and there would be a new symbol or keyword used to represent the previous value.
For example, the MATLAB REPL allows for referring to the previous value as `ans` and the Julia REPL has inherited the same functionality. You can copy-paste this into the Julia REPL today:
[1, 2, 3];
map(x -> x * 2, ans);
@show ans;
filter(x -> x > 2, ans);
@show ans;
sum(ans)
You can't use this in Julia outside the REPL, and I don't think `ans` is a particularly good keyword for this, but I honestly think the concept is good enough. The same thing in JavaScript using `$` as an example: {
[1 ,2, 3];
$.map(x => x * 2);
(console.log($), $);
$.filter(x => x > 2);
(console.log($), $);
$.reduce((acc, next) => acc + next, 0)
}
I feel it would work best with expression-based languages having blocks that return their final value (like Rust) since you can do all sorts of nesting and so-on.I think most interactive programming shells has an equivalent.
https://datapad.readthedocs.io/en/latest/quickstart.html#ove...
fn get_ids(data: Vec<Widget>) -> Vec<Id> {
collect(map(filter(map(iter(data), |w| w.toWingding()), |w| w.alive), |w| w.id))
}
to fn get_ids(data: Vec<Widget>) -> Vec<Id> {
data.iter()
.map(|w| w.toWingding())
.filter(|w| w.alive)
.map(|w| w.id)
.collect()
}
The first one would read more easily (and, since it called out, diff better) fn get_ids(data: Vec<Widget>) -> Vec<Id> {
collect(
map(
filter(
map(iter(data), |w| w.toWingding()), |w| w.alive), |w| w.id))
}
Admittedly, the chaining is still better. But a fair number of the article's complaints are about the lack of newlines being used; not about chaining itself.Of course this really only matters when you're 25 minutes into critical downtime and a bug is hiding somewhere in these method chains. Anything that is surprising needs to go.
IMHO it would be better to set intermediate variables with dead simple names instead of newlines.
fn get_ids(data: Vec<Widget>) -> Vec<Id> {
let iter = iter(data);
let wingdings = map(iter, |w| w.toWingding());
let alive_wingdings = filter(wingdings, |w| w.alive);
let ids = map(alive_wingdings, |w| w.id);
let collected = collect(ids);
collected
}
Yeah, I agree. The problem is that you have to keep track of nesting in the middle of the expression and then unnest it at the end, which is taxing.
So, I also think it could also read better written like this, with the arguments reversed, so you don't have to read it both ways:
fn get_ids(data: Vec<Widget>) -> Vec<Id> {
collect(
map(|w| w.id,
filter |w| w.alive,
(map(|w| w.toWingding(), iter(data)))))
}
That's also what they do in Haskell. The first argument to map is the mapping function, the first argument to filter is the predicate function, and so on. People will often just write the equivalent of: getIDs = map getID . filter alive . map toWingDing
as their function definitions, with the argument omitted because using the function composition operator looks neater than using a bunch of dollar signs or parentheses.Making it the second argument only makes sense when functions are written after their first argument, not before, to facilitate writing "foo.map(f).filter(y)".
Along with the `|>` operator (which is itself just a function that's conventionally infixed), this turns out to be really nice for flexibility/reusability. All of these programs do the same thing:
1 - 2 - 3 + 4
1
|> -(2)
|> -(3)
|> +(4)
+(4)(
-(3)(
-(2)(1)
)
)
It was extremely satisfying to discover that with this encoding, `|>` is simply an identity function![0]: https://github.com/mkantor/please-lang-prototype
[1]: In reality variable dereferencing uses a sigil, but I'm omitting it from this comment to keep the examples focused.
See how adding line breaks still keeps the `|w| w.alive` very far from the `filter` call? And the `|w| w.id` very far from the `map` call?
If you don't have the pipeline operator, please at least format it something like this:
fn get_ids(data: Vec<Widget>) -> Vec<Id> {
collect(
map(
filter(
map(
iter(data),
|w| w.toWingding()
),
|w| w.alive
),
|w| w.id
)
)
}
...which is still absolutely atrocious both to write and to read!Also see how this still reads fine despite being one line:
fn get_ids(data: Vec<Widget>) -> Vec<Id> {
data.iter().map(|w| w.toWingding()).filter(|w| w.alive).map(|w| w.id).collect()
}
It's not about line breaks, it's about the order of applying the operations, and about the parameters to the operations you're performing.For me, it's both. Honestly, I find it much less readable the way you're split it up. The way I had it makes it very easy for me to read it in reverse; map, filter, map, collect
> Also see how this still reads fine despite being one line
It doesn't read fine, to me. I have to spend mental effort figuring out what the various "steps" are. Effort that I don't need to spend when they're split across lines.
For me, it's a "forest for the trees" kind of thing. I like being able to look at the code casually and see what it's doing at a high level. Then, if I want to see the details, I can look more closely at the code.
Yes, sure, who cares. But the way you wrote it, it's impossible to match those "map, filter, map, collect" to their parameters: `, |w| w.toWingding()), |w| w.alive), |w| w.id))`. Impossible!
You just include all the parameters to the various function calls on one very long line! To top it off, it's the most indented line that you decide to make the longest! If I could I'd put you in jail for this!
> You might think that this issue is just about trying to cram everything onto a single line, but frankly, trying to move away from that doesn’t help much. It will still mess up your git diffs and the blame layer.
Diff will still be terrible because adding a step will change the indentation of everything 'before it' (which, somewhat confusingly, are below it syntactically) in the chain.
iter [ alive? ] filter [ id>> ] map collect
The beauty of this is that everything can be evaluated strictly left-to-right. Every single symbol. "Pipelines" in other languages are never fully left-to-right evaluated. For example, ".filter(|w| w.alive)" in the author's example requires one to switch from postfix to infix evaluation to evaluate the filter application.The major advantage is that handling multiple streams is natural. Suppose you want to compute the dot product of two files where each line contains a float:
fileA fileB [ lines [ str>float ] map ] bi@ [ mul ] 2map 0 [ + ] reduce
Because I love to practice and demonstrate Factor, this is working code for that example:
"f1.txt" "f2.txt" [ utf8 file-lines [ string>number ] map ] bi@ vdot
Being able to inspect the results of each step right at the point you’ve written it is pretty convenient. It’s readable. And the compiler will optimize it out.
fn get_ids(data: Vec<Widget>) -> Vec<Id> {
let mut result = Vec::new();
for widget in &data {
if widget.alive {
result.push(widget.id);
}
}
result
}
more readable than this: fn get_ids(data: Vec<Widget>) -> Vec<Id> {
data.iter()
.filter(|w| w.alive)
.map(|w| w.id)
.collect()
}
and I also dislike Rust requiring you to write "mut" for function mutable values. It's mostly just busywork and dogma.I think the imperative style isn't as readable (of course I would), but that's absolutely a discussion for another day, and I get why people prefer it.
The APL family is similarly consistent, except RTL.
"coolvalue" thisisthevar set
or if you use the `variables` vocab, alternately: "coolvalue" set: thisisthevar
and lexical variables are set with "coolvalue" :> thisisthevar
x = iter(data);
y = filter(x, w=>w.isAlive);
z = map(y, w=>w.id);
return collect(z);
It doesn't need new syntax, but to implement this with the existing syntax you do have to figure out what the intermediate objects are, but you also have that problem with "pipelining" unless it compiles the whole chain into a single thing a la Linq.
The tone of this (and the entire Haskell section of the article, tbh) is rather strange. Operators aren't special syntax and they aren't "added" to the language. Operators are just functions that by default use infix position. (In fact, any function can be called in infix position. And operators can be called in prefix position.)
The commit in question added & to the prelude. But if you wanted & (or any other character) to represent pipelining you have always been able to define that yourself.
Some people find this horrifying, which is a perfectly valid opinion (though in practice, when working in Haskell it isn't much of a big deal if you aren't foolish with it). But at least get the facts correct.
This is my biggest complaint about Python.
$x = vec[2,1,3]
|> Vec\map($$, $a ==> $a * $a) // $$ with value vec[2,1,3]
|> Vec\sort($$); // $$ with value vec[4,1,9]
It is a nice feature. I do worry about error reporting with any feature that combines multiple statements into a single statement, which is essentially what this does. In Java, there was always an issue with NullPointerExceptiosn being thrown and if you chain several things together you're never sure which one was null.[1]: https://docs.hhvm.com/hack/expressions-and-operators/pipe
I remember being able to deal with object streams with it quite comfortably.
Java has a culture of having a layer above fundamentals.
We're past all that already. I am discussing the ergonomics of their null checking APIs, particularly in the context of pipelining (or streaming, in the Java world).
I find them quite comfortable.
Um, you can:
#![feature(import_trait_associated_functions)]
use Iterator::{collect, map, filter};
fn get_ids2(data: Vec<usize>) -> Vec<usize> {
collect(map(filter(<[_]>::iter(&data), |v| ...), |v| ...))
}
and you can because it's lazy, which is also the same reason you can write it the other way.. in rust. I think the author was getting at an ownership trap, but that trap is avoided the same way for both arrangements, the call order is the same in both arrangements. If the calls were actually a pipeline (if collect didn't exist and didn't need to be called) then other considerations show up.For comparison, UNIX pipes support only trivial byte streams from output to input.
PowerShell allows typed object streams where the properties of the object are automatically wired up to named parameters of the commands on the pipeline.
Outputs at any stage can not only be wired directly to the next stage but also captured into named variables for use later in the pipeline.
Every command in the pipeline also gets begin/end/cancel handlers automatically invoked so you can set up accumulators, authentication, or whatever.
UNIX scripting advocates don’t know what they’re missing out on…
I use powershell daily but am hopeful that I can replace it with Nushell at some point.
Also, does the name matter if it works the same and has the same properties?
Maybe the author called it "pipelines" to avoid functional purists from nitpicking it.
In the context of a specific programming language feature it seems like terminology would be important, I wasn't trying to nitpick unintentionally.
These "pipelines" and "object streaming" APIs are often built upon OOP. I feel that calling it "transducers" would offend the sensibilities of those who think it must be functional all the way down.
Don't you think it's better to keep it with a different name? I mean, even among the functional community itself there seems to be a lot of stress around purity, why would anyone want to make it worse?
I've used languages and libraries that call it piping, ramda has a .pipe() method for example. Don't think I've ever seen it called pipelining but I see how you could get there.
Then the zookeeper angrily beats down the kid while screaming: "You stupid moron, that's a Panthera tigris"
His father instead, buys him a book that say "tiger" and has some cool illustrations.
a().let{ b(it) }.let{ c(it) }
And it's already idiomatic unlike bolting a pipeline operator onto a language that didn't start with it.
// extension function on Foo.Companion (similar to static class function in Java)
fun Foo.Companion.create(block: FooBuilder.() -> Unit): Foo =
FooBuilder().apply(block).build()
// example usage
val myFoo = Foo.create {
setSomeproperty("foo")
setAnotherProperty("bar")
}
Works for any Java/Kotlin API that forces you into method chaining and calling build() manually. Also works without extension functions. You can just call it fun createAFoo(..) or whatever. Looking around in the Kotlin stdlib code base is instructive. Lots of little 1/2 liners like this.Pipelining can guide one to write a bit cleaner code, viewing steps of computation as such, and not as modifications of global state. It forces one to make each step return a result, write proper functions. I like proper pipelining a lot.
i mean this sounds fun
but tbh it also sounds like it'd result in my colleague Carl defining an utterly bespoke DSL in the language, and using it to write the worst spaghetti code the world has ever seen, leaving the code base an unreadable mess full of sharp edges and implicit behavior
This is far different than the pattern described in the article, though. Small shame they have come to have the same name. I can see how both work with the metaphor; such that I can't really complain. The "pass a single parameter" along is far less attractive to me, though.
BTW. For people complaining about debug-ability of it: https://doc.rust-lang.org/std/iter/trait.Iterator.html#metho... etc.
You have a create_user function that doesn't error? Has no branches based on type of error?
We're having arguments over the best way break these over multiple lines?
Like.. why not just store intermediate results in variables? Where our branch logic can just be written inline? And then the flow of data can be very simply determined by reading top to bottom?
Instead of writing: a().b().c().d(), it's much nicer to write: d(c(b(a()))), or perhaps (d ∘ c ∘ b ∘ a)().
draw(line.start, line.end);
print(i, round(a / b));
you'd write line.start, line.end |> draw;
i, a, b |> div |> round |> print;
line.start line.end draw
i a b div round print
1. do the first step in the process
2. then do the next thing
3. followed by a third action
I struggle to think of any context outside of programming, retrosynthesis in chemistry, and some aspects of reverse-Polish notation calculators, where you conceive of the operations/arguments last-to-first. All of which are things typically encountered pretty late in one's educational journey.
a(b())
then you're already breaking your left-to-right/first-to-last rule.
a(axe).b(baz, bog).c(cid).d(dot)
vs d(c(b(a(axe), baz, bog), cid), dot)
gitRef = with lib;
pipe .git/HEAD [
readFile
trim
(splitString ":")
last
trim
(ref: ./git/${ref})
readFile
trim
];
Super clean and cool! from customer
left join orders on c_custkey = o_custkey and o_comment not like '%unusual%'
group by c_custkey
alias count(o_orderkey) as count_of_orders
group by count_of_orders
alias count(*) as count_of_customers
order by count_of_customers desc
select count_of_customers, count_of_orders;
I'm using 'alias' here as a strawman keyword for what the slide deck calls a free-standing 'as' operator because you can't reuse that keyword, it makes the grammar a mess.The aliases aren't really necessary, you could just write the last line as 'select count(count(*)) ncust, count(*) nord' if you aren't afraid of nested aggregations, and if you are you'll never understand window functions, soo...
The |> syntax adds visual noise without expressive power, and the novelty 'aggregate'/'call' operators are weird special-case syntax for something that isn't that complex in the first place.
The implicit projection is unnecessary too, for the same reason any decent SQL linter will flag an ambiguous 'select *'
It does seem like the former could be solved by just loosening up the grammar to allow you to specify things in any order. Eg this seems perfectly unambiguous:
from customer
group by c_custkey
select c_custkey, count(*) as count_of_customers
Anyway, JS wins again, give it a try if you haven't, it's one of the best languages out there.
One difference is that currying returns an incomplete result (another function) which must be called again at a later time. On the other hand, pipelining usually returns raw values. Currying returns functions until the last step. The main philosophical failure of currying is that it treats logic/functions as if they were state which should be passed around. This is bad. Components should be responsible for their own state and should just talk to each other to pass plain information. State moves, logic doesn't move. A module shouldn't have awareness of what tools/logic other modules need to do their jobs. This completely breaks the separation of concerns principle.
When you call a plumber to fix your drain, do you need to provide them with a toolbox? Do you even need to know what's inside their toolbox? The plumber knows what tools they need. You just show them what the problem is. Passing functions to another module is like giving a plumber a toolbox which you put together by guessing what tools they might need. You're not a plumber, why should you decide what tools the plumber needs?
Currying encourages spaghetti code which is difficult to follow when functions are passed between different modules to complete the currying. In practice, if one can design code which gathers all the info it needs before calling the function once; this leads to much cleaner and much more readable code.
b) Async works on Scala Native: https://github.com/lampepfl/gears and is coming to Scala.js.
A
.B
.C
|| D
|| E
I have no idea what this is trying to say, or what it has to do with the rest of the article.
... looking at you R and tidyverse hell.
imap(f) = x -> Iterators.map(f, x)
ifilter(f) = x -> Iterators.filter(f, x)
v = things |>
ifilter(isodd) |>
imap(do_process) |>
collect