CI As it Should Be

- Based on [[Plan9]]'s `mk(1)` (simpler, more focused version of GNU/BSD Make) - `mk(1)` is different enough from `make(1)` to be distinct while also simple enough for users to pick up easily - Basic OS image with well-known, writable locations for system-wide configuration - Executed every time CI runs, must be fast - Has the following packages installed - `podman` - `mk` - `git` - `rsync` - `janet` - If I stick with [[Janet]] as the implementation language - Whatever is included in the base, probably [[Alpine Linux]] image - Utilize well-known files and a handful of virtual targets to perform common functions - `test:V:` - `prepare-runtime:V: base-image` - Called before every CI job within the host OS - `base-image:V:` - `/var/run/ci/env/CONTAINERIMAGE: base-image` - `echo $CONTAINERIMAGE > $target` - `/var/run/ci/crontab` - `/var/run/ci/artifacts/%` - `/var/run/ci/cache/%` - Add dependencies to these targets to orchestrate CI - Some targets, such as `base-image:V:` are executed in the base OS and used to set up the hermetic execution environment - Stubs for common CI platforms like [[GitLab]], [[GitHub]], etc. - Custom targets can be executed from stubs if desired - Simple command that wraps `qemu-system` that creates a daemon process - Injects credential/tokens for authentication to the orchestrator - Maintains outputs for recent CI runs that can be fetched for playback - Utilizes git hash/ref, environment variables to determine if CI can skip real execution - Run tests locally in hermetically sealed environment, have it count towards your CI - Do not need dedicated CI executors if you do not want Eliminating redundancy via centralized entry points and minimizing compute and wall clock time are core elements of making a pleasant development experience. Engineers are smart, but every line of code, every script, every package, is a decision and source of complexity. Pleasant software makes good decisions while allowing the flexibility to make your own when appropriate. This is the fundamental concept of abstraction via software. The shell is the lowest-common-denominator between your application and the *environment* (OS). Its pervasiveness makes it a good candidate as a default language for configuration. # Design ## Orchestrator A service that does very simple authentication management for users. Users log in using OAuth2 to [[GitLab]], [[GitHub]], or a *[[Passkey]]* (maybe? focus on first two). The primary purpose of this service is to provide authentication and authorization for connecting executors to another executor that may want to skip execution and simply play back results. > [!question] > How do we map executors to a single account? I'm not interested in providing full RBAC, but I think basic "I have a token associated with this repo" *should* be sufficient. During execution of a test, an executor **A** may ask the Orchestrator to query any other executors in the *connected* state whether they have executed tests for the current revision/hash that executor A is about to run tests for. If a *connected* executor **B** has a successful compatible test run, the Orchestrator proxies or facilitates setting up a peer-to-peer connection between executor A and executor B. Executor B send artifacts and stdout/stderr for its test execution to executor A, who then replays the result and exits successfully. If there is no executor that could fulfill the role of executor **B**, such as when there are no other executors in the *connected* state or none that have a successful compatible test run, executor **A** executes the test normally. > [!question] > A better, simpler design might be to have a single service per repo, group, or user. The service provides no authentication itself, but utilizes externally managed mTLS to provide authentication. This alleviates some concerns with proxying and e2e encryption because the users that care enough will have the ability to host their own infrastructure. A SaaS solution that provides name-spacing could be a monetization scheme for those that want to move fast and don't have a threat model that dictates e2e encryption. ## Executor The executor is a wrapper around `qemu-system` and a particular OS image set up to provide a consistent environment for executing tests. Not only does it provide setup-hooks via the `mk(1)` command and target, but it is responsible for running the image generated by the `base-image:V:` virtual target and exported as the `CONTAINERIMAGE` environment variable. > [!question] > Should I make the `CONTAINERIMAGE` environment variable user-configurable? Probably not, I can't think of a reason why you would want to unless you were unlucky enough to have it overlap. Perhaps a more unique name is in order. The executor does the following in order: 1. Boots up the core OS image using `qemu-system` 2. Clones the repository at the revision requested by the CI or simply copies the current version of the repo (ignoring dirty working tree!) into the core OS 3. Executes the `mk prepare-runtime` target from the root of the repository as the `root` user of the OS 4. Runs the container image provided by/bound to `CONTAINERIMAGE` and *assumed* to have been generated as part of the `base-image` target using the `podman` command, bind-mounting the root of the repository to the *working directory* of the image, and executing the `mk test` command 1. How do I get `mk` reliably into the base image? 2. I think create a target that `echo`s the `CONTAINERIMAGE` to a well-known file that depends on `base-image` 3. Possibly bind-mount `/usr/lib/plan9` into the container and add `bin/` to `PATH` 5. If successful, the `/var/run/ci/artifacts/%` target is run from the core OS image, copying any artifacts from the successful test execution into a directory where it may be retrieved later (note, the user must specify this rule) 6. The `post-test:V:` target is then run. By default this does nothing The executor retains generated artifacts and `stdout`/`stderr` for a configurable amount of time, though by default this is only once. After executing a test, the executor will *hang* in the *connected* mode until a `SIGINT` or `SIGTERM` is received. This allows another executor to request test results from the runner. Alternatively, the executor *may* be configured to run in *daemon* mode, where it a small background process that orchestrates local test execution and may provide more than a single test run of data. > [!question] > How should I configure what orchestrator an executor connects to > [!answer] > Probably via the shims or environment variables stored in `.env` # Questions - How do I connect a particular instance of an executor to a particular repo? - I think using snapshots that more or less correspond to the following states - Completely fresh, only with the baseline dependencies - Per-repo snapshot that gets updated with new cached objects - Cached version gets periodically dropped or manually with a flag - How do I namespace on repo? - Most git repos will have a URI for their `origin`, but this isn't a guarantee - How do I build and distribute the executor OS base image? - Some environment variables/arguments to the entrypoint should count against whether the CI should be allowed to count as a "successful run of this version of the code" - Some users may want some subset of jobs to run in a real CI runner vs a developers workstation. We may choose to support this case, however I think it should be out of scope initially - The publishing of artifacts and security scans are about the only thing I can think of that this would benefit greatly from - Some environment environment variables should be injected but their values shouldn't be counted/hashed. These variables should generally be restricted to tokens and other secrets. Perhaps we have a separate utility for secrets specifically? - Should we support synchronization of caches? - No, not initially