Any program can be a GitHub Actions shell(yossarian.net)

282 pointsby woodruffw3 months ago21 comments

jstrieb3 months ago
I've used this in the past to force bash to print every command it runs (using the -x flag) in the Actions workflow. This can be very helpful for debugging.
https://github.com/jstrieb/just.sh/blob/2da1e2a3bfb51d583be0...
- 0xbadcafebee3 months ago
  FYI with pipefail enabled, if one of the pipes fails, your step will fail, there will be no error output, you won't know why it failed.
  Pipefail also doesn't prevent more complex error states. For example this step from your config:
  curl -L "https://github.com/casey/just/releases/download/${{ matrix.just-version }}/just-${{ matrix.just-version }}-x86_64-apple-darwin.tar.gz" \ | sudo tar -C /usr/local/bin -xzv just
  Here's the different error conditions you will run into:
  1. curl succeeds, sudo succeeds, tar succeeds, but just fails to extract from the tarball. Tar reports error, step fails.
  2. curl succeeds, sudo succeeds, tar fails. Sudo reports error, step fails.
  3. curl succeeds, sudo fails. Shell reports error, step fails.
  4. curl begins running. sudo begins running in a subshell/pipe. tar begins running under the sudo pipe, extracting half of the just binary. curl fails due to network error. Due to pipefail being enabled, shell exits immediately. There is no error message. A corrupt executable is left on-disk (which will be attempted to run if your step had failure-skipping enabled)
  - jchw3 months ago
    > there will be no error output, you won't know why it failed
    That's probably why the -x is there. (Well, that and if something like curl or sudo fails it tends to output something to stderr...)
    > Pipefail also doesn't prevent more complex error states ... A corrupt executable is left on-disk (which will be attempted to run if your step had failure-skipping enabled)
    If I'm reading right it seems like you're suggesting is that the case pipefail doesn't handle is if you explicitly ignore the exit code. That doesn't exactly seem like the most concerning catch 22, to be honest.
    0xbadcafebee3 months ago
    -x does not show output/errors when pipefail triggers. It tells you a pipe has started, and that's it. No idea what specific part of the pipe failed, or why, or what its return status was.
    It's not that pipefail doesn't handle a case, it's that it doesn't tell you what happened. It does not report what failed or why. Your shell just exits with a mystery return code.
    jchw3 months ago
    The point is that if you run -x you will definitely see plenty of output leading up to the failure. It is true that if the command that fails outputs nothing to stderr, then this may still lead to confusing outcomes, but you will not be staring at empty output, you'll be staring at the commands leading up to the failure.
    This is of course no different than if you had set -e and then a command with no pipes failed silently without outputting anything to stderr.
    I don't personally see why this is notable in relation to pipefail.
  - cryptonector3 months ago
    You're supposed to also use `set -e` if you're going to `set -o pipefail`, but of course that requires understanding that `set -e` will not apply to anything happening from inside a function called in a conditional expression -- this is a tremendous footgun.
    And if you want to know which command in a pipe failed there's `PIPESTATUS`.
    0xbadcafebee3 months ago
    If they're using -e with pipefail you can't check PIPESTATUS because the shell has exited.
    If you use pipefail without -e, nothing happens, except the return status of the line is different (and thus using the option is pointless unless you're checking return statuses/PIPESTATUS, which of course nobody who uses this option actually does, because they don't know how it works)
    From the Bash manual:
    The return status of a pipeline is the exit status of the last command, unless the pipefail option is enabled. If pipefail is enabled, the pipeline's return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully. If the reserved word ! precedes a pipeline, the exit status of that pipeline is the logical negation of the exit status as described above. The shell waits for all commands in the pipeline to terminate before returning a value. pipefail If set, the return value of a pipeline is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands in the pipeline exit successfully. This option is disabled by default.
    They don't mention how the rightmost command might be the one failing due to the lefthand one failing if it's trying to read data and fails, or from a signal handler, or if you have multiple commands in the pipe and it's the first one that failed, or multiple because (again) maybe they're feeding input to each other and two of them failed, etc.
    The use of this option is an anti-pattern. You don't even need it because you can check PIPESTATUS without it.
    Everybody uses pipefail because bloggers and people on HN say to use it, but without actually understanding what it's doing.
    cryptonector3 months ago
    Yes, it's either or -- or else you can have a pipe in a conditional.
    3 months ago
    undefined
digianarchist3 months ago
One cool undocumented GitHub Actions trick I spotted at work was the ability to use wildcards to match repository_dispatch event names:
```
    on:
      repository_dispatch:
        - security_scan
        - security_scan::*
```
Why would you want to do this?
We centralize our release pipelines as it's the only way to force repositories through a defined reusable workflow (we don't want our product teams to have to maintain them).
This allows us to dispatch an event like so:
```
    {
      "event_type": "security_scan::$product_name::$version",
      "client_payload": {
        "field": "value"
      }
    }
```
Then it is far easier to identify which product and version a workflow is running when looking in the Actions tab of our central release repository.
- c0wb0yc0d3r3 months ago
  How have you found this centralized approach to work for you? Does your org require everything be built in exactly the same way?
  I’ve just joined an organization that is trying to do similar, but in practice it seems nearly worthless. Templates are frequently broken, or written in ways that expect code to be built in a particular manner without any supporting docs describing the requirements.
  - digianarchist3 months ago
    We centralize releases but we don't centralize builds. That would remove too much autonomy from teams.
    We don't use templates for any of this. Our interface is the payload sent with repository_dispatch, a few metadata files in the repository (which we fetch) and a GitHub application that allows us to update the PRs with the release status checks.
    GitHub doesn't have a great story here, ideally we would want to listen to CI events emitting from a repo and run workflows as a reaction.
    The reusable story on a modular basis is better but here we're missing features that we would need to move some of workflows into Actions. Notably - Action repos need to be public.
    connorgurney3 months ago
    Only to use the actions within on public repositories.
    https://docs.github.com/en/actions/sharing-automations/shari...
    digianarchist3 months ago
    To clarify my previous comment. Actions cannot be private for public repositories. Most of our repos are public.
- larusso3 months ago
  Nice trick!
greener_grass3 months ago
My experience is that the less done in GitHub actions the better.
I tend to prefer either:
- Using a build-system (e.g. Make) to encode logic and just invoke that from GitHub Actions; or
- Writing a small CLI program and then invoke that from GitHub Actions
It's so much easier to debug this stuff locally than in CI.
So an interesting trick, but I don't see where it would be useful.
- donatj3 months ago
  Our workflows amount to essentially
  - make build - make test
  We got bought out, and the company that bought us' workflow files are hundreds and hundreds of lines long, often with repeated sections.
  Call me old school, but I want to leave YAML town as soon as possible.
  - lucasyvas3 months ago
    I may be targeted for calling this the “correct” way, but it is - it’s the only correct way.
    Otherwise you need complicated setups to test any of the stuff you put up there since none of it can be run locally / normally.
    GitHub Actions, like any CI/CD product, is for automating in ways you cannot with scripting - like parallelizing and joining pipelines across multiple machines, modelling the workflow. That’s it.
    I would really appreciate an agnostic templating language for this so these workflows can be modelled generically and have different executors, so you could port them to run them locally or across different products. Maybe there is an answer to this that I’ve just not bothered to look for yet.
    giancarlostoro3 months ago
    > I would really appreciate an agnostic templating language for this so these workflows can be modelled generically and have different executors, so you could port them to run them locally or across different products. Maybe there is an answer to this that I’ve just not bothered to look for yet.
    Terraform? You can use it for more than just "cloud"
    jasonlotito3 months ago
    In addition, adding our own custom modules for terraform is, all things considered, fairly easy. Much easier than dealing with the idiosyncrasies of trying to use YAML for everything.
    giancarlostoro3 months ago
    I have not used Terraform outside of for Azure resources, but I am always astounded by how much it can handle.
    HdS843 months ago
    Maybe packer? That's terraform syntax, but optimized for building Vms.
    donatj3 months ago
    It was YAML but I actually really liked Drone CI's "In this container, run these commands" it was much more sane than GitHub Actions "Here's an environment we pre-installed a bunch of crap in, you can install the stuff you want every single time you run a workflow".
    vel0city3 months ago
    You can specify a container image in GHA for a job to run in.
    https://docs.github.com/en/actions/writing-workflows/choosin...
    HdS843 months ago
    You can also do that in Gh. Gizlab does that, too. But I hate it. When you do this, every build becomes cargo culting commands until it works. Ntm every build it's is own special snowflake from hell. I want standardized builds which output everything I need (artifact, warnings, security issues etc) without me feeling like a necromancer. Gh solves this okayish, but extending and debugging is a pain.
    dleeftink3 months ago
    Maybe what .devcontainers does? As a thin wrapper to Docker, I find it makes testing configurations easier.
- latexr3 months ago
  I don’t see the post as the author suggesting you do this, but informing that it can be done. There’s a large difference. Knowing the possibilities of a system, even if it’s things you never plan on using, is useful for security and debugging.
- 6LLvveMx2koXfwn3 months ago
  My take was that it is not useful, definitely, categorically not useful. It is a potential security hazard though. Especially for 'exploring' self-hosted runners.
cturner3 months ago
Our generation shuddered in terror when we were asked to translate a spreadsheet to code while the spreadsheet continued to evolve.
This generation will shudder when they are asked to bring discipline to deployments built from github actions.
- rogerrogerr3 months ago
  As someone currently working to move a large enterprise to GH Actions (not quite, but “yaml-based pipelines tied to git”) - what would discipline look like? If you can describe it, I can probably make it happen at my org.
  - TheDong3 months ago
    I'll give a shot at some guiding principals:
    1. Do not use yaml.
    All github action logic should be written in a language that compiles to yaml, for example dhall (https://dhall-lang.org/). Yaml is an awful language for programmers, and it's a worse language for non-programmers. It's good for no one.
    2. To the greatest extent possible, do not use any actions which install things.
    For example, don't use 'actions/setup-node'. Use bazel, nix, direnv, some other tool to setup your environment. That tool can now also be used on your developer's machines to get the same versions of software as CI is using.
    3. Actions should be as short and simple as possible.
    In many cases, they will be as simple as effectively "actions/checkout@v4", "run: ./ci/build.sh", and that's it.
    Escape from yaml as quickly as possible, put basic logic in bash, and then escape from bash as quickly as possible too into a real langauge.
    4. Do not assume that things are sane or secure by default.
    Ideally you don't accept PRs from untrusted users, but if you do, read all the docs very carefully about what actions can run where, etc. Github actions on untrusted repos are a nightmare footgun.
    hnbad3 months ago
    I agree with most of the points but I would condense #2 and #3 to "Move most things into scripts". Sometimes it's difficult to avoid complex workflows but generally it's a safer bet to have actual scripts you can re-use and use for other environments than GitHub. It's a bad idea to make yourself dependent entirely on one company's CI system, especially if it's free or an add-on feature.
    However I'd balk at the suggestion to use Dhall (or any equally niche equivalent) based on a number of factors:
    1) If you need this advice, you probably don't know Dhall nor does anyone else who has worked or will work on these files, so everyone has to learn a new language and they'll all be novices at using that language.
    2) You're adding an additional dependency that needs to be installed, maintained and supported. You also need to teach everyone who might touch the YAML files about this dependency and how to use it and not to touch the output directly.
    3) None of the advice on GitHub Workflows out there will apply directly to the code you have because it is written in YAML so even if Dhall will generate YAML for you, you will need to understand enough YAML to convert it to Dhall correctly. This also introduces a chance for errors because of the friction in translating from the language of the code you read to the language of the code you write.
    4) You are relying on the Dhall code to correctly map to the YAML code you want to produce. Especially if you're inexperienced with the language (see above) this means you'll have to double check the output.
    5) It's a niche language so it's neither clear that it's the right choice for the project/team nor that it will continue to be useful. This is an extremely high bar considering the effort involved in training everyone to use it and it's not clear at all that the trade-off is worth it outside niche scenarios (e.g. government software that will have to be maintained for decades). It's also likely not to be a transferable skill for most people involved.
    The point about YAML being bad also becomes less of an issue if you don't have much code in your YAML because you've moved it into scripts.
    SOLAR_FIELDS3 months ago
    The other problem with Github Actions that I always mention that muddies the waters when it comes to discussions of it is that GHA itself is front loaded as actually multiple different things:
    1. Event dispatching/triggers, the thing that spawns webhooks/events to do things
    2. The orchestration implementation (steps/jobs, a DAG-like workflow execution engine)
    3. The reusable Actions marketplace
    4. The actual code that you are running as part of the build
    5. The environment setup/secrets of GHA, in other words, the makeup of how variables and other configurations are injected into the environment.
    The most maintainable setups only leverage 1 directly from GHA. 2-5 can be ignored or managed through containerized workflow in some actual build system like Bazel, Nix, etc.
    michaelmior3 months ago
    You're also adding an extra build step that by its nature can't run in CI since it generates the CI pipelines. So now you need some way to keep your Dhall and YAML in sync. I suppose you could write one job in YAML that compiles the Dhall and fails the build if it's out of date, but it seems like a lot of extra work for minimal payoff.
    Instead, if you want to stay away from YAML, I'd say just move as much of the build as possible into external scripts so that the YAML stays very simple.
    JoelMcCracken3 months ago
    Not too long ago, I went down a rabbit hole of specifying GHA yaml via dhall, and quickly hit some problems; the specific thing I was starting with was the part I was frustrated with, which was the "expresssions" evaluation stuff.
    However, I quickly ran into the whole "no recursive data structures in dhall" (at least, not how you would normally think about it), and of course, a standard representation of expressions is a recursively defined data type.
    I do get why dhall did this, but it did mean that I quickly ran into super advanced stuff, and realized that I couldn't in good conscience use this as my team of mixed engineers would need to read/maintain it in the future, without any knowledge of how to do recursive definitions in dhall, and without the inclination to care either.
    an intro to this: https://docs.dhall-lang.org/howtos/How-to-translate-recursiv...
    An example in the standard lib is how it works with JSON itself: https://store.dhall-lang.org/Prelude-v23.1.0/JSON/Type.dhall...
    basically, to do recursive definitions, you have to lambda encode your data types, work with them like that, and then finally "reify" them with, like, a concrete list type at the end, which means that all those lambdas evaluate away and you're just left with list data. This is neat and intresting and worthy of learning, but would be wildly overly-complicated for most eng teams I think.
    After hitting this point in the search, I decided to go another route: https://github.com/rhysd/actionlint
    and this project solved my needs such that I couldn't justify spending more time on it any longer.
    Intralexical3 months ago
    YAML is supposed to be a strict superset of JSON. So if it's the footguns and complexity you're trying to avoid, just write it as JSON.
    eadmund3 months ago
    > Escape from yaml as quickly as possible, put basic logic in bash, and then escape from bash as quickly as possible too into a real language.
    This should be your #1 rule. Don’t compile logic to YAML, just write it in a real language and call it as quickly as possible.
    This way a developer can run it from his workstation.
    michaelmior3 months ago
    > To the greatest extent possible, do not use any actions which install things.
    Why not? I assume the concern is making sure development environments and production use the same configuration as CI. But that feels like somewhat of an orthogonal issue. For example, in Node.js, I can specify both the runtime and package manager versions using standard configuration. I think it's a bonus that how those specific versions get installed can be somewhat flexible.
    JoelMcCracken3 months ago
    To me, the issue comes when something weird is going on in CI that isn't happening locally, and you're stuck debugging it with that typical insanity.
    Yeah, it may be that you'll get the exact same versions of things installed, but that doesn't help when some other weird thing is going on.
    If you haven't experienced this, well, keep doing what you're doing if you want, but just file this reflection away for if/when you do hit this issue.
    michaelmior3 months ago
    I've experienced this, but not often enough that it feels like a significant barrier.
    rafram3 months ago
    > All github action logic should be written in a language that compiles to yaml
    An imperative language that compiles to a declarative language that emulates imperative control flow and calls other programs written in imperative languages that can have side effects that change control flow? Please no.
  - oblio3 months ago
    0. Make 99% of your setups runnable locally, with Docker if need be. It's the fastest way to test something and nothing else come close. #1 and #2 derive from #0. This is actually a principle for code, too, if you have stuff like Lambda, make sure you have a command line entry point, too and you can also test things locally.
    1. Avoid YAML if you can. Either plain configuration files (generated if need be - don't be afraid to do this) or full blown programming languages with all the rigor required (linting/static analysis, tests, etc).
    2. Move ALL logic outside of the pipeline tool. Your actions should be ./my-script.sh or ./my-tool.
    Source: lots of years of experience in build engineering/release engineering/DevOps/...
  - huijzer3 months ago
    Probably a good idea to explicitly pin GitHub Actions to commit hashes as I wrote about a few days ago: https://huijzer.xyz/posts/jas/
    Also put as much as possible in bash or justfile instead of inside the yaml. It avoids vendor lock-in and makes local debugging easier.
    michaelmior3 months ago
    I understand the arguments for putting more things in scripts instead of GHA YAML. However, I also like that breaking things up into multiple YAML steps means I get better reporting via GitHub. Of course I could have multiple scripts that I run to get the same effect. But I wish there was a standard protocol for tools to report progress information to a CI environment. Something like the Test Anything Protocol[0], but targeted at CI/CD.
    GitHub Actions workflow commands[1] are similar to what I'm thinking of, but not standardized.
    [0] https://testanything.org/ [1] https://docs.github.com/en/actions/writing-workflows/choosin...
    mikepurvis3 months ago
    I was involved in a discussion about this here a few weeks ago: https://news.ycombinator.com/item?id=43427996
    It's frustrating that we're beholden to Github to add support for something like this to their platform, especially when the incentives are in the wrong direction— anything that's more generic and more portable reduces lock-in to Actions.
  - skrebbel3 months ago
    The golden rule is "will I need to make a dummy commit to test this?" and if yes, find a different way to do it. All good rules in sibling comments here derive from this rule.
    You not want to ever need to make dummy commits to debug something in CI, it's awful. As a bonus, following this rule also means better access to debugging tools, local logs, "works on CI but not here" issues, etc. Finally if you ever want to move away from GitHub to somewhere else, it'll be easy.
  - vl3 months ago
    For CI action: pre-build docker image with dependencies, then run your tests using this image as single GitHub action command. If dependencies change, rebuild image.
    Do not rely on gh caching, installs, multiple steps, etc.
    Otherwise there will be a moment when tests pass locally, but not on gh, and debugging will be super hard. In this case you just debug in the same image.
  - cturner3 months ago
    Some themes I think about,
    1. Distinct prod and non-prod environments. I think you should have distinct Lab and Production environments. It should be practical to commit something to your codebase, and then test it in Lab. Then, you deploy that to Production. The Github actions model confuses the concepts of (source control) and (deployment environment). So you easily end up with no lab environment, and people doing development work against production.
    2. Distinguish programming language expression and DSLs. Github yaml reminds me of an older time where people built programming languages in XML. It is an interesting idea, but it does not work out. The value of a programming language: the more features it has, the better. The value of a DSL: the fewer features it has, the better.
    3. Security. There is a growing set of github-action libraries. The Github ecosystem makes it easy to install runners on workstations to accept dispatch from github actions. This combination opens opportunities for remote attacks.
  - lidder863 months ago
    I have been looking at repo templates. That by default include basic ci checks for example.. linters for anything you can think of
- adastra223 months ago
  What is undisciplined about this?
  - TheDong3 months ago
    yaml is roughly as disciplined as malbolge.
    Writing any meaningful amount of logic or configuration in yaml will inevitably lead to the future super-sentient yaml-based AI torturing you for all eternity for having taken any part in cursing it to a yaml-based existence. The thought-experiment of "Roko's typed configuration language" is hopefully enough for you to realize how this blog post needs to be deleted from the internet for our own safety.
    adastra223 months ago
    I literally have no idea what you are talking about. Declarative languages are great for specifying these sorts of things.
    TheDong3 months ago
    > Declarative languages are great for specifying these sorts of things
    Yes, good declarative languages are. I'm a happy nix user. I like dhall. cel is a cool experiment. jsonnet has its place. I stan XML.
    A language with byzantine type rules, like 'on: yes' parses the same as "true: true", but only in some languages (like ruby's built-in yaml parser for example), but not others, and only with some settings, is not it chief.
    It isn't even one language, since most yaml parsers only have like 90% coverage of the spec, and it's a different percent, so the same yaml document often won't be parsed the same even by two libraries in the same programming language. It's really like 20 subtly incompatbile languages that are all called "yaml".
    It is indefensible in any context. Github actions should have been in starlark, xml, or even lisp or lua.
ZiiS3 months ago
You can also trick the default shell 'bash' into running any program.
throw109203 months ago
As long as other readers of the action are aware of what's happening, this seems pretty useful. There's been many adventures where my shell script, starting out as a few lines basically mirroring what I typed in by hand, has grown to a hundred-line-plus monster where I wish that I had real arrays and types and the included batteries in the Python stdlib.
I'm definitely not going to use this to implement my company's build actions in elisp.
aljarry3 months ago
Github Actions Runner code is pretty easy to read, here's a specific place that define default arguments for popular shells / binaries: https://github.com/actions/runner/blob/main/src/Runner.Worke..., it is exported through a method ScriptHandlerHelpers.GetScriptArgumentsFormat.
In ScriptHandler.cs there's all the code for preparing process environment, arguments, etc. but specifically here's actual code to start the process:
https://github.com/actions/runner/blob/main/src/Runner.Worke...
Overall I was positively surprised at simplicity of this code. It's very procedural, it handles a ton of edge cases, but it seems to be easy to understand and debug.
markus_zhang3 months ago
Wait I can finally write C for our CI/CD in production and call it a low level system job.
Probably can write assembly too.
- Tabular-Iceberg3 months ago
  And here’s the library for it: https://github.com/tsoding/nob.h
- wilg3 months ago
  put the C in CI
dolmen3 months ago
This gives me hope to ease running Go code for CI jobs directly from GitHub workflow YAML files using goeval [1].
However goeval doesn't yet have direct support for file input (only stdin), so shell tricks are needed.
So far the way is:
```
  run: |
    go run github.com/dolmen-go/goeval@v1 - <<'EOF'
    fmt.Println("Hello")
    EOF
```
but this requires a bit of boilerplate.
Disclaimer: I'm the author of goeval.
[1] https://github.com/dolmen-go/goeval
- michaelmior3 months ago
  Why do you need any shell "tricks"? Wouldn't this work?
  go run github.com/dolmen-go/goeval@v1 - < file.go
  - mdaniel3 months ago
    I believe the difference is that in your example file.go needs to live in the repo (or created by a previous step:) whereas reading from stdin allows writing the logic in the .yaml itself, and thus is subject to uses: or other reuse tricks
    michaelmior3 months ago
    Sure. But the previous post seemed to suggest that tricks were needed because goeval doesn't support reading from files.
  - dolmen3 months ago
    I'm trying to use goeval as a GitHub workflow shell, as this is the subject of the blog post.
    GitHub requreq {0} to be present on the shell line.
- mdaniel3 months ago
  I went to the repo to find out if your program automatically converted spaces into tabs but it seems it is more oriented toward oneliner expressions
  If you're going to plug your toy in this context, showing something more relevant than "print hello" would have saved me the click
z3t43 months ago
What are the advantages with Github CI yaml over just a bash script, eg run: pipeline.sh ?
- dharmab3 months ago
  - you have an automatic, managed GitHub API token which is useful for automating releases and publishing artifacts and containers
  - if your software is cross platform you can run jobs across a variety of OSes and CPU architectures concurrently, e.g. building and testing natively on all platforms
  - you have access to a lot of contextual information about what triggered the job and the current state of the repo, which is handy for automating per-PR chores or release automation
  - You can integrate some things into the GitHub Web UI, such as having your linter annotate the PR line-by-line with flagged problems, or rendering test failures in the web page so you don't have to scan through a long log for them
  - You have a small cache you can use to avoid redownloading/rebuilding files that have not changed between builds
  Ideally you do as much as possible in a regular tool that runs locally (make/scripts/whatever) and you use the GitHub CI config for the little bit of glue that you need for the triggers, caching and GitHub integrations
- hmhhashem3 months ago
  - You get an overview in the Github UI for each step and can expand/collapse each step to inspect its output.
  - You can easily utilize Github actions that others have contributed in your pipeline.
  - You can modularize workflows and specify dependencies between them and control parallel executions.
  I'm sure there are more. But the main advantage is you don't need to implement all these things yourself.
  - PhilipRoman3 months ago
    For #1, you can output section markers from any software: https://docs.github.com/en/actions/writing-workflows/choosin... (I've only used this feature with GitLab)
    hmhhashem3 months ago
    Thanks, I didn't know about this!
  - noirscape3 months ago
    That second one sounds more like a security risk to me than a feature.
- Hackbraten3 months ago
  One advantage for GitHub is that you’re less likely to migrate to another Git forge.
  - prmoustache3 months ago
    Pipelines are usually just a list of sequential steps. I have been working with a lot of different CI/CD tools and they are among the easiest thing to move from one to another.
    Hackbraten3 months ago
    One example: for my personal Python projects, I use two GitHub actions named `pypa/gh-action-pypi-publish` [0] and `sigstore/gh-action-sigstore-python` [1] to sign my wheels, publish my wheels to PyPI, and have PyPI attest (and publicly display via check mark [2]) that the uploaded package is tied to my GitHub identity.
    How would I even begin migrating this to another forge? And that’s just a small part of the pipeline.
    [0]: https://github.com/marketplace/actions/pypi-publish
    [1]: https://github.com/marketplace/actions/gh-action-sigstore-py...
    [2]: https://pypi.org/project/itchcraft/
    woodruffw3 months ago
    This is only a small part, but FWIW: you don’t need gh-action-sigstore-python to do the signing; gh-action-pypi-publish will do it automatically for you now.
    (Source: I maintain the former and contributed the attestations change to the latter.)
    prmoustache3 months ago
    sigstore is not a github action specific tool, you can use the python client with any CI/CD tool runner. You can attest with py_pi attestations and publish with twine.
    When migrating the steps don't have to use the same syntax and tools, but for each step you can identify the desired outcome and create it without actions from the gh marketplace on a different CI/CD.
    More importantly, you consciously decided to make your pipeline not portable by using gh actions from the marketplace. This is not a requirement nor inevitable.
  - misnome3 months ago
    Which of any the alternatives don’t have their own unique solution?
    Hackbraten3 months ago
    The question was "GitHub CI YAML vs. pipeline.sh", not "GitHub CI YAML vs. other forge’s YAML."
    What I’m trying to say is that if you keep your build logic in `pipeline.sh` (and use GitHub CI only for calling into it), then you’re going to have an easier time migrating to another forge’s CI than in the alternative scenario, i.e. if your build logic is coded in GitHub CI YAML.
    misnome3 months ago
    Obviously. But then you still have caching, passing data/artifacts between stages, workflow logic (like skipping steps if unnecessary), running on multiple platforms, and exposing test results/coverage to the system you are running in.
    Written properly, actually building the software is the least of what the CI is doing.
    If your build is simple enough that you don’t need any of that - great. But pretending that the big CI systems never do anything except lock you in is a trifle simplistic.
_def3 months ago
Ah I didn't know of the shell directive. Basically an equivalent to #! in shell scripts I guess: https://en.wikipedia.org/wiki/Shebang_%28Unix%29
- latexr3 months ago
  Not exactly.
  > If the command doesn't already take a single file as input, you need to pass it the special `{0}` argument, which GitHub replaces with the temporary file that it generates the template-expanded `run` block into.
  It seems to be writing your script to a file, then using an executable to run your file. That ignores any shebang.
nonethewiser3 months ago
Who would the potential bad actor be here? Someone whose committing to your repo right? I guess the risk is that they add something malicious in the commit and you dont see it. Which is maybe obfuscated to some extent by this little known fact. But all the malicious code would be there in the open just like any other commit.
I mean, it seems like it would either take not noticing the malicious code which is always a threat vector. Or seeing it and mistakenly thinking "aha, but you arent actually running it!" and then letting it through based on that (which is of course ridiculous).
Or there is some other way to exploit this that I'm unaware of.
Edit: OK, maybe this is a little better. Re-write some malicious bash look alike somewhere outside the repo, install it from github actions (make it look like you are updating bash or something) and then its doing the bad thing.
- woodruffw3 months ago
  Not committing necessarily; there are plenty of GitHub Action triggers that a workflow can use that allow third-party use.
  I don’t think there’s a really direct security risk here, per se: it’s just another way (there were plenty already) in which write implies execute in GHA.
pseufaux3 months ago
uv --script
hulitu3 months ago
> Any program can be a GitHub Actions shell
systemd, echo "1" > /proc/sys/kernel/panic, echo > /bin/bash, etc.
- pjerem3 months ago
  Doesn’t mean that the shell is given more permissions than other commands.
3 months ago
undefined
cdata3 months ago
I wonder if you could pair this with nix e.g.,:
```
    - shell: nix develop --command {0}
      run: ...
```
- asmor3 months ago
  In my experience, the default VM size is so slow, you probably don't want Nix on a workflow that doesn't already take minutes.
  Even with a binary cache (we used R2), installing Lix, Devbox and some common tools costs us 2 1/2 minutes. Just evaluating the derivation takes ~20-30 seconds.
  - adobrawy3 months ago
    You can use a self-hosted runner with an image that has anything pre-loaded.
  - manx3 months ago
    Is there a way to cache the derivation evaluation?
    chuckadams3 months ago
    You can cache arbitrary directories in github actions, but the nix package cache is enormous and probably bigger than GH's cache system will allow. Restoring multi-gig caches is also not instant, though it still beats doing everything from scratch. Might be more feasible to bake the cache into a container image instead. I think any nix enthusiast is still likely to go for self-hosted runners though.
    asmor3 months ago
    The default cache action also has issued with anything that isn't owned by the runner user, and caches are per-repository, so you can't just have one cache like you do for binary caches.
- tadfisher3 months ago
  Yes, we do this, although you need to do `nix develop --command bash -- {0}` to make it behave as a shell.
ViperCode3 months ago
how many workflows could be simplified this way without sacrificing debuggability or security."
nickysielicki3 months ago
Would be cool to use this with nix shell shebangs
mkoubaa3 months ago
The next DOOM port
- leonheld3 months ago
  Has been done: https://youtu.be/Z1Nf8KcG4ro?t=1107
donatj3 months ago
I mean it's just a shell script jammed into YAML for reasons. The shell is just the shebang of said script
emilfihlman3 months ago
I mean, exec already exists, so you can become anything anyways.
- PhilipRoman3 months ago
  I've always been confused by how the CI script is evaluated in these yaml based systems. It is written as an array of lines, but seems to be running in a single context (local variables, etc.), but also there is some special handling for exit code of each line. I'm not sure how exec would work.
  - Arnavion3 months ago
    >It is written as an array of lines, but seems to be running in a single context (local variables, etc.) [...]
    It's documented for GH at https://docs.github.com/en/actions/writing-workflows/workflo...
    >>The shell command that is run internally executes a temporary file that contains the commands specified in the run keyword.
    ... and also mentioned in the article submitted here:
    >>If the command doesn't already take a single file as input, you need to pass it the special {0} argument, which GitHub replaces with the temporary file that it generates the template-expanded run block into.
    ---
    >[...] but also there is some special handling for exit code of each line.
    As you can see in the defaults in the first link, the Linux default is to run bash with `-e`
  - emilfihlman3 months ago
    I mean technically GitHub could intercept the exec syscall, but given that the shell allows executing other programs that call fork and exec, you can't really block it (and why would you, really).
    Highly likely that exec would work.