https://github.com/jstrieb/just.sh/blob/2da1e2a3bfb51d583be0...
Pipefail also doesn't prevent more complex error states. For example this step from your config:
curl -L "https://github.com/casey/just/releases/download/${{ matrix.just-version }}/just-${{ matrix.just-version }}-x86_64-apple-darwin.tar.gz" \
| sudo tar -C /usr/local/bin -xzv just
Here's the different error conditions you will run into:1. curl succeeds, sudo succeeds, tar succeeds, but just fails to extract from the tarball. Tar reports error, step fails.
2. curl succeeds, sudo succeeds, tar fails. Sudo reports error, step fails.
3. curl succeeds, sudo fails. Shell reports error, step fails.
4. curl begins running. sudo begins running in a subshell/pipe. tar begins running under the sudo pipe, extracting half of the just binary. curl fails due to network error. Due to pipefail being enabled, shell exits immediately. There is no error message. A corrupt executable is left on-disk (which will be attempted to run if your step had failure-skipping enabled)
That's probably why the -x is there. (Well, that and if something like curl or sudo fails it tends to output something to stderr...)
> Pipefail also doesn't prevent more complex error states ... A corrupt executable is left on-disk (which will be attempted to run if your step had failure-skipping enabled)
If I'm reading right it seems like you're suggesting is that the case pipefail doesn't handle is if you explicitly ignore the exit code. That doesn't exactly seem like the most concerning catch 22, to be honest.
It's not that pipefail doesn't handle a case, it's that it doesn't tell you what happened. It does not report what failed or why. Your shell just exits with a mystery return code.
This is of course no different than if you had set -e and then a command with no pipes failed silently without outputting anything to stderr.
I don't personally see why this is notable in relation to pipefail.
And if you want to know which command in a pipe failed there's `PIPESTATUS`.
If you use pipefail without -e, nothing happens, except the return status of the line is different (and thus using the option is pointless unless you're checking return statuses/PIPESTATUS, which of course nobody who uses this option actually does, because they don't know how it works)
From the Bash manual:
The return status of a pipeline is the exit status of the last command, unless the pipefail option is enabled. If pipefail is enabled, the pipeline's return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully. If the reserved word ! precedes a pipeline, the exit status of that pipeline is the logical negation of the exit status as described above. The shell waits for all commands in the pipeline to terminate before returning a value.
pipefail
If set, the return value of a pipeline is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands in the pipeline exit successfully. This option is disabled by default.
They don't mention how the rightmost command might be the one failing due to the lefthand one failing if it's trying to read data and fails, or from a signal handler, or if you have multiple commands in the pipe and it's the first one that failed, or multiple because (again) maybe they're feeding input to each other and two of them failed, etc.The use of this option is an anti-pattern. You don't even need it because you can check PIPESTATUS without it.
Everybody uses pipefail because bloggers and people on HN say to use it, but without actually understanding what it's doing.
on:
repository_dispatch:
- security_scan
- security_scan::*
Why would you want to do this?We centralize our release pipelines as it's the only way to force repositories through a defined reusable workflow (we don't want our product teams to have to maintain them).
This allows us to dispatch an event like so:
{
"event_type": "security_scan::$product_name::$version",
"client_payload": {
"field": "value"
}
}
Then it is far easier to identify which product and version a workflow is running when looking in the Actions tab of our central release repository.I’ve just joined an organization that is trying to do similar, but in practice it seems nearly worthless. Templates are frequently broken, or written in ways that expect code to be built in a particular manner without any supporting docs describing the requirements.
We don't use templates for any of this. Our interface is the payload sent with repository_dispatch, a few metadata files in the repository (which we fetch) and a GitHub application that allows us to update the PRs with the release status checks.
GitHub doesn't have a great story here, ideally we would want to listen to CI events emitting from a repo and run workflows as a reaction.
The reusable story on a modular basis is better but here we're missing features that we would need to move some of workflows into Actions. Notably - Action repos need to be public.
https://docs.github.com/en/actions/sharing-automations/shari...
I tend to prefer either:
- Using a build-system (e.g. Make) to encode logic and just invoke that from GitHub Actions; or
- Writing a small CLI program and then invoke that from GitHub Actions
It's so much easier to debug this stuff locally than in CI.
So an interesting trick, but I don't see where it would be useful.
- make build
- make test
We got bought out, and the company that bought us' workflow files are hundreds and hundreds of lines long, often with repeated sections.Call me old school, but I want to leave YAML town as soon as possible.
Otherwise you need complicated setups to test any of the stuff you put up there since none of it can be run locally / normally.
GitHub Actions, like any CI/CD product, is for automating in ways you cannot with scripting - like parallelizing and joining pipelines across multiple machines, modelling the workflow. That’s it.
I would really appreciate an agnostic templating language for this so these workflows can be modelled generically and have different executors, so you could port them to run them locally or across different products. Maybe there is an answer to this that I’ve just not bothered to look for yet.
Terraform? You can use it for more than just "cloud"
https://docs.github.com/en/actions/writing-workflows/choosin...
This generation will shudder when they are asked to bring discipline to deployments built from github actions.
1. Do not use yaml.
All github action logic should be written in a language that compiles to yaml, for example dhall (https://dhall-lang.org/). Yaml is an awful language for programmers, and it's a worse language for non-programmers. It's good for no one.
2. To the greatest extent possible, do not use any actions which install things.
For example, don't use 'actions/setup-node'. Use bazel, nix, direnv, some other tool to setup your environment. That tool can now also be used on your developer's machines to get the same versions of software as CI is using.
3. Actions should be as short and simple as possible.
In many cases, they will be as simple as effectively "actions/checkout@v4", "run: ./ci/build.sh", and that's it.
Escape from yaml as quickly as possible, put basic logic in bash, and then escape from bash as quickly as possible too into a real langauge.
4. Do not assume that things are sane or secure by default.
Ideally you don't accept PRs from untrusted users, but if you do, read all the docs very carefully about what actions can run where, etc. Github actions on untrusted repos are a nightmare footgun.
However I'd balk at the suggestion to use Dhall (or any equally niche equivalent) based on a number of factors:
1) If you need this advice, you probably don't know Dhall nor does anyone else who has worked or will work on these files, so everyone has to learn a new language and they'll all be novices at using that language.
2) You're adding an additional dependency that needs to be installed, maintained and supported. You also need to teach everyone who might touch the YAML files about this dependency and how to use it and not to touch the output directly.
3) None of the advice on GitHub Workflows out there will apply directly to the code you have because it is written in YAML so even if Dhall will generate YAML for you, you will need to understand enough YAML to convert it to Dhall correctly. This also introduces a chance for errors because of the friction in translating from the language of the code you read to the language of the code you write.
4) You are relying on the Dhall code to correctly map to the YAML code you want to produce. Especially if you're inexperienced with the language (see above) this means you'll have to double check the output.
5) It's a niche language so it's neither clear that it's the right choice for the project/team nor that it will continue to be useful. This is an extremely high bar considering the effort involved in training everyone to use it and it's not clear at all that the trade-off is worth it outside niche scenarios (e.g. government software that will have to be maintained for decades). It's also likely not to be a transferable skill for most people involved.
The point about YAML being bad also becomes less of an issue if you don't have much code in your YAML because you've moved it into scripts.
1. Event dispatching/triggers, the thing that spawns webhooks/events to do things
2. The orchestration implementation (steps/jobs, a DAG-like workflow execution engine)
3. The reusable Actions marketplace
4. The actual code that you are running as part of the build
5. The environment setup/secrets of GHA, in other words, the makeup of how variables and other configurations are injected into the environment.
The most maintainable setups only leverage 1 directly from GHA. 2-5 can be ignored or managed through containerized workflow in some actual build system like Bazel, Nix, etc.
Instead, if you want to stay away from YAML, I'd say just move as much of the build as possible into external scripts so that the YAML stays very simple.
However, I quickly ran into the whole "no recursive data structures in dhall" (at least, not how you would normally think about it), and of course, a standard representation of expressions is a recursively defined data type.
I do get why dhall did this, but it did mean that I quickly ran into super advanced stuff, and realized that I couldn't in good conscience use this as my team of mixed engineers would need to read/maintain it in the future, without any knowledge of how to do recursive definitions in dhall, and without the inclination to care either.
an intro to this: https://docs.dhall-lang.org/howtos/How-to-translate-recursiv...
An example in the standard lib is how it works with JSON itself: https://store.dhall-lang.org/Prelude-v23.1.0/JSON/Type.dhall...
basically, to do recursive definitions, you have to lambda encode your data types, work with them like that, and then finally "reify" them with, like, a concrete list type at the end, which means that all those lambdas evaluate away and you're just left with list data. This is neat and intresting and worthy of learning, but would be wildly overly-complicated for most eng teams I think.
After hitting this point in the search, I decided to go another route: https://github.com/rhysd/actionlint
and this project solved my needs such that I couldn't justify spending more time on it any longer.
This should be your #1 rule. Don’t compile logic to YAML, just write it in a real language and call it as quickly as possible.
This way a developer can run it from his workstation.
Why not? I assume the concern is making sure development environments and production use the same configuration as CI. But that feels like somewhat of an orthogonal issue. For example, in Node.js, I can specify both the runtime and package manager versions using standard configuration. I think it's a bonus that how those specific versions get installed can be somewhat flexible.
Yeah, it may be that you'll get the exact same versions of things installed, but that doesn't help when some other weird thing is going on.
If you haven't experienced this, well, keep doing what you're doing if you want, but just file this reflection away for if/when you do hit this issue.
An imperative language that compiles to a declarative language that emulates imperative control flow and calls other programs written in imperative languages that can have side effects that change control flow? Please no.
1. Avoid YAML if you can. Either plain configuration files (generated if need be - don't be afraid to do this) or full blown programming languages with all the rigor required (linting/static analysis, tests, etc).
2. Move ALL logic outside of the pipeline tool. Your actions should be ./my-script.sh or ./my-tool.
Source: lots of years of experience in build engineering/release engineering/DevOps/...
Also put as much as possible in bash or justfile instead of inside the yaml. It avoids vendor lock-in and makes local debugging easier.
GitHub Actions workflow commands[1] are similar to what I'm thinking of, but not standardized.
[0] https://testanything.org/ [1] https://docs.github.com/en/actions/writing-workflows/choosin...
It's frustrating that we're beholden to Github to add support for something like this to their platform, especially when the incentives are in the wrong direction— anything that's more generic and more portable reduces lock-in to Actions.
You not want to ever need to make dummy commits to debug something in CI, it's awful. As a bonus, following this rule also means better access to debugging tools, local logs, "works on CI but not here" issues, etc. Finally if you ever want to move away from GitHub to somewhere else, it'll be easy.
Do not rely on gh caching, installs, multiple steps, etc.
Otherwise there will be a moment when tests pass locally, but not on gh, and debugging will be super hard. In this case you just debug in the same image.
1. Distinct prod and non-prod environments. I think you should have distinct Lab and Production environments. It should be practical to commit something to your codebase, and then test it in Lab. Then, you deploy that to Production. The Github actions model confuses the concepts of (source control) and (deployment environment). So you easily end up with no lab environment, and people doing development work against production.
2. Distinguish programming language expression and DSLs. Github yaml reminds me of an older time where people built programming languages in XML. It is an interesting idea, but it does not work out. The value of a programming language: the more features it has, the better. The value of a DSL: the fewer features it has, the better.
3. Security. There is a growing set of github-action libraries. The Github ecosystem makes it easy to install runners on workstations to accept dispatch from github actions. This combination opens opportunities for remote attacks.
Writing any meaningful amount of logic or configuration in yaml will inevitably lead to the future super-sentient yaml-based AI torturing you for all eternity for having taken any part in cursing it to a yaml-based existence. The thought-experiment of "Roko's typed configuration language" is hopefully enough for you to realize how this blog post needs to be deleted from the internet for our own safety.
Yes, good declarative languages are. I'm a happy nix user. I like dhall. cel is a cool experiment. jsonnet has its place. I stan XML.
A language with byzantine type rules, like 'on: yes' parses the same as "true: true", but only in some languages (like ruby's built-in yaml parser for example), but not others, and only with some settings, is not it chief.
It isn't even one language, since most yaml parsers only have like 90% coverage of the spec, and it's a different percent, so the same yaml document often won't be parsed the same even by two libraries in the same programming language. It's really like 20 subtly incompatbile languages that are all called "yaml".
It is indefensible in any context. Github actions should have been in starlark, xml, or even lisp or lua.
I'm definitely not going to use this to implement my company's build actions in elisp.
In ScriptHandler.cs there's all the code for preparing process environment, arguments, etc. but specifically here's actual code to start the process:
https://github.com/actions/runner/blob/main/src/Runner.Worke...
Overall I was positively surprised at simplicity of this code. It's very procedural, it handles a ton of edge cases, but it seems to be easy to understand and debug.
However goeval doesn't yet have direct support for file input (only stdin), so shell tricks are needed.
So far the way is:
run: |
go run github.com/dolmen-go/goeval@v1 - <<'EOF'
fmt.Println("Hello")
EOF
but this requires a bit of boilerplate.Disclaimer: I'm the author of goeval.
go run github.com/dolmen-go/goeval@v1 - < file.go
GitHub requreq {0} to be present on the shell line.
If you're going to plug your toy in this context, showing something more relevant than "print hello" would have saved me the click
Probably can write assembly too.
- if your software is cross platform you can run jobs across a variety of OSes and CPU architectures concurrently, e.g. building and testing natively on all platforms
- you have access to a lot of contextual information about what triggered the job and the current state of the repo, which is handy for automating per-PR chores or release automation
- You can integrate some things into the GitHub Web UI, such as having your linter annotate the PR line-by-line with flagged problems, or rendering test failures in the web page so you don't have to scan through a long log for them
- You have a small cache you can use to avoid redownloading/rebuilding files that have not changed between builds
Ideally you do as much as possible in a regular tool that runs locally (make/scripts/whatever) and you use the GitHub CI config for the little bit of glue that you need for the triggers, caching and GitHub integrations
- You can easily utilize Github actions that others have contributed in your pipeline.
- You can modularize workflows and specify dependencies between them and control parallel executions.
I'm sure there are more. But the main advantage is you don't need to implement all these things yourself.
How would I even begin migrating this to another forge? And that’s just a small part of the pipeline.
[0]: https://github.com/marketplace/actions/pypi-publish
[1]: https://github.com/marketplace/actions/gh-action-sigstore-py...
(Source: I maintain the former and contributed the attestations change to the latter.)
When migrating the steps don't have to use the same syntax and tools, but for each step you can identify the desired outcome and create it without actions from the gh marketplace on a different CI/CD.
More importantly, you consciously decided to make your pipeline not portable by using gh actions from the marketplace. This is not a requirement nor inevitable.
What I’m trying to say is that if you keep your build logic in `pipeline.sh` (and use GitHub CI only for calling into it), then you’re going to have an easier time migrating to another forge’s CI than in the alternative scenario, i.e. if your build logic is coded in GitHub CI YAML.
Written properly, actually building the software is the least of what the CI is doing.
If your build is simple enough that you don’t need any of that - great. But pretending that the big CI systems never do anything except lock you in is a trifle simplistic.
> If the command doesn't already take a single file as input, you need to pass it the special `{0}` argument, which GitHub replaces with the temporary file that it generates the template-expanded `run` block into.
It seems to be writing your script to a file, then using an executable to run your file. That ignores any shebang.
I mean, it seems like it would either take not noticing the malicious code which is always a threat vector. Or seeing it and mistakenly thinking "aha, but you arent actually running it!" and then letting it through based on that (which is of course ridiculous).
Or there is some other way to exploit this that I'm unaware of.
Edit: OK, maybe this is a little better. Re-write some malicious bash look alike somewhere outside the repo, install it from github actions (make it look like you are updating bash or something) and then its doing the bad thing.
I don’t think there’s a really direct security risk here, per se: it’s just another way (there were plenty already) in which write implies execute in GHA.
- shell: nix develop --command {0}
run: ...
Even with a binary cache (we used R2), installing Lix, Devbox and some common tools costs us 2 1/2 minutes. Just evaluating the derivation takes ~20-30 seconds.
It's documented for GH at https://docs.github.com/en/actions/writing-workflows/workflo...
>>The shell command that is run internally executes a temporary file that contains the commands specified in the run keyword.
... and also mentioned in the article submitted here:
>>If the command doesn't already take a single file as input, you need to pass it the special {0} argument, which GitHub replaces with the temporary file that it generates the template-expanded run block into.
---
>[...] but also there is some special handling for exit code of each line.
As you can see in the defaults in the first link, the Linux default is to run bash with `-e`
Highly likely that exec would work.