Skip to content

tests: Fix parallel runs#4360

Merged
daandemeyer merged 5 commits into
systemd:mainfrom
martinpitt:mkosi-fixes
Jun 22, 2026
Merged

tests: Fix parallel runs#4360
daandemeyer merged 5 commits into
systemd:mainfrom
martinpitt:mkosi-fixes

Conversation

@martinpitt

@martinpitt martinpitt commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Pulled out of #4357 to review/land them separately. These apply to pytest as well in principle, if you run two tests in one terminal each, or use pytest-xdist.

@martinpitt martinpitt mentioned this pull request Jun 14, 2026
7 tasks
@martinpitt martinpitt marked this pull request as draft June 14, 2026 09:37
@martinpitt

This comment was marked as resolved.

@martinpitt martinpitt force-pushed the mkosi-fixes branch 2 times, most recently from 6b06013 to 2fde203 Compare June 14, 2026 10:03
@martinpitt martinpitt marked this pull request as ready for review June 14, 2026 10:52
@martinpitt

martinpitt commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

Works now. The remaining three failures are the known "opensuse mirror" SNAFU, plus the arch/debian timeout fixed in unstable (should go into testing next Tuesday).

@behrmann @daandemeyer can you please take a look? Danke! 🙏

@martinpitt

Copy link
Copy Markdown
Contributor Author

Added yet another concurrency fix -- now the tests in #4357 are green 🎉

Comment thread mkosi/__init__.py
Comment thread tests/__init__.py Outdated
Comment thread tests/__init__.py Outdated
Comment thread tests/__init__.py Outdated
Comment thread mkosi/config.py

configdir = finalize_configdir(args.directory)
historydir = finalize_historydir(args)
historydir = finalize_historydir(args, context.cli.get("output_dir"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not checked this yet, but I'm not sure this works generally. The context here is a ParseContext and there is more parsing having later (context.parse_new_includes). I think the output directory could be changed later so this might be brittle and the history file end up in places that are unexpected. I think the point why the history dir is in the top-level because that is the one thing safely known at this point. @daandemeyer will remember the design better than me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe this is consistent. Note that this only applies to the CLI option. Includes write to self.config, not self.cli.config. If -O is on the CLI, the value at here is already final; later parsing cannot change it.

The one exception is

for s in SETTINGS:
- that moves the finalized values from config to cli, and at that point historydir is already computed.

However: If the output dir comes from mkosi.conf or mkosi.local.conf or an include, context.cli.get("output_dir") is None, so history stays in configdir for both write and subsequent read, so it still works. I updated the comment and commit message to explain this better, I hope it is clearer now?

The current design prevents any parallel builds, i.e. --output-dir is only half-implemented and I'd argue that writing a build specific history into a global dir is also broken. This commit is vital for parallel tests, so I'd rather keep it in this PR and finish the discussion about it.

Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However: If the output dir comes from mkosi.conf or mkosi.local.conf or an include, context.cli.get("output_dir") is None, so history stays in configdir for both write and subsequent read, so it still works. I updated the comment and commit message to explain this better, I hope it is clearer now?

Mhm, but our mkosi.conf does set OutputDirectory=mkosi.output, so doesn't that kinda throw wrench into things? Even if we didn't have that, we document that we default to mkosi.output, if it exists, which would just move the problem with a single history file to a different directory or am I misunderstanding something?

What about a different ansatz and we add an option to just not write history. We could use that in CI and would sidestep the history problem, since I think that multiple images being built from the same config in the same directory is maybe only a problem in CI—at least at $DAYJOB we solve it by templating our config into different trees and for a repo this would equally work with different git worktrees.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I understood the OutputDirectory=mkosi.output in mkosi.conf as the default if you don't specify anything, but a setting in mkosi.local.conf trumps that, and --output-directory on the CLI trumps both.

which would just move the problem with a single history file to a different directory

I'm afraid I don't understand this. The purpose is precisely that builds with different output directories would stop writing different histories into the same global dir.

I actually had History=no in an intermediate version. I discarded it as it felt too much like a hack: i.e. not testing what mkosi does by default. We could document that if you use --output-directory you need to disable history. If you are fine with that approach, I can give this another go (I don't remember the details how well this worked, as I discarded that path).

Using worktrees would also work for CI with some extra hacks -- we do want to re-use the main tools and image build across tests, it's just that some tests build their own additional image. So with work trees, the test setup would have to copy these directories too, which is quite some extra work (these are large).

Perhaps this warrants a high-bandwidth gmeet discussion? I'm on PTO for the rest of the week, just occasionally look into my laptop. But could do that next week. And perhaps in the meantime put up a workaround with History=no or --history=no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revisited this, and --history=no in tests' Image.build() really can't work: It only disables writing the history, but that doesn't stop .vm() and .boot() from reading an existing history file. That will have entirely the wrong information then -- e.g. the "main" build has format "directory" (via mkosi.conf default, latest.json is empty), but test_initrd_luks uses --format=disk; similar for any other specific option. This would only work if history was disabled for all builds and all options get copied between build() and vm().

The only other option that I see is to add an explicit --history-dir to the CLI to make this more explicit. But honestly I don't like that -- it's conceptually redundant with --output-dir, like when would you ever set this to a different dir?

So I put back the original commit again for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but I reworded the commit message to be clearer, and include more reasoning.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I follow your reasoning now and agree that it's necessary, but @daandemeyer should give his thumbs up on this one, too.

Comment thread mkosi/installer/__init__.py Outdated
Comment thread mkosi/__init__.py Outdated
@martinpitt martinpitt force-pushed the mkosi-fixes branch 2 times, most recently from 377ec6a to 67069b7 Compare June 16, 2026 07:16
@martinpitt martinpitt requested a review from behrmann June 16, 2026 07:22
@martinpitt martinpitt force-pushed the mkosi-fixes branch 4 times, most recently from bde1fa1 to f59732e Compare June 18, 2026 16:31
martinpitt and others added 5 commits June 19, 2026 05:33
mkosi signs SHA256SUMS by running gpg, which autostarts a gpg-agent if none is
running. As mkosi's sandbox has no PID namespace, that agent daemonizes and is
leaked when the sandbox goes away.

This is even worse when running unprivileged, as the leaked agents hold
systemd-nsresourced dynamic UID ranges and eventually exhaust the pool
(`io.systemd.NamespaceResource.NoDynamicRange`). But the process is
leaked either way.

Shut the agent down after signing. Note that this will also kill a
"real" user agent if one was running already; but that is hard/racy to
avoid, and gpg auto-starts a new one anyway.
Run VMs with 1.5 GiB of RAM by default, which is enough for most tests.
This doubles the test density for parallel runs, as GitHub's default
runners have a little less than 4 GiB in total, which could not even fit
two parallel runs before.

The only exception is `test_initrd_luks`: repart's default LUKS2 KDF
(Argon2id) is memory-hard and needs ~1 GiB of RAM just to derive the
key. Run that with 2 GiB as before.
Without `--machine`, mkosi defaults to name "mkosi". This breaks
parallel test runs. The vsock CID name is derived from the machine
name, so this automatically becomes unique as well.
Running e.g. `test_initrd` and `test_initrd_luks` in parallel fails one of
them with "Image 'main' has not been built yet". The integration tests build
into a per-test `--output-directory`, but `vm()`/`boot()` did not pass it, so
those verbs recovered the build configuration from the *shared* global history
in `<configdir>/.mkosi-private/history/latest.json`. With concurrent builds
that file holds whatever the last build wrote, so a verb reads back another
build's config (e.g. the wrong `Format=`).

Tie the build history to the output directory: when an output directory is
given on the CLI, store and read the history under it instead of in the config
directory. Each build's history is then isolated, and a verb pointed at a
given `--output-directory` reads back exactly that build's configuration. In
the tests, pass `--output-directory` to `vm()` and `boot()` as well.

As a consequence, `mkosi vm` (and the other verbs that consume a previous
build) now require `-O`/`--output-directory` when the build used one. This is
a behaviour change, but unbreaks having more than one output dir.

Note: If a config file sets `OutputDirectory=`, the history continues to
be in the config dir, as before. The computation of the history
directory (necessarily) happens before parsing the config
files/includes. This *only* applies to the CLI option.

Rejected alternatives:
 * This cannot be worked around with `--history=no` in the tests': that
   only disables *writing* history, not *reading* it, so vm/boot still
   pick up a stale (in our setup, empty) `latest.json` and fall back to
   the wrong config.
 * A dedicated `--history-dir` option would just be redundant with
   `--output-dir`.
The package cache directory is shared between all mkosi builds of the same
distribution (see `Config.package_cache_dir_or_default()`) and is bind mounted
read-write into every package manager sandbox by `mounts()`. When multiple builds
run in parallel, they download packages into it concurrently, which corrupts
in-flight cache files: dnf's rpm gets truncated mid-unpack ("Errors occurred
during transaction"), zypper can't hardlink its preloaded rpm into place ("Can't
hardlink/copy ... .preload/..."), etc.

Observed when running `test_addon` and `test_confext` in parallel: both
build extension images with `--incremental=no --package=lsof`, so they
download lsof into the shared cache at the same time and clobber each
other.

Lock the package cache directory for the duration of every package
manager invocation to serialize writes. Every package manager operation
goes through `sandbox()`, so locking there covers install, sync, and
remove across all package managers.

Builds of different distributions use different cache directories and so
don't contend, and cached/incremental builds that don't invoke the
package manager never take the lock, so parallelism is preserved where
it matters.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@daandemeyer daandemeyer merged commit bdd341f into systemd:main Jun 22, 2026
47 of 49 checks passed
@martinpitt martinpitt deleted the mkosi-fixes branch June 22, 2026 08:22
@bluca

bluca commented Jun 24, 2026

Copy link
Copy Markdown
Member

This breaks the systemd CI, so I'll file a revert: https://github.com/systemd/systemd/actions/runs/28126107457/job/83290221148?pr=42739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants