[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220426065917.3123488-1-bartoschek@google.com>
Date: Tue, 26 Apr 2022 08:59:17 +0200
From: Christoph Bartoschek <bartoschek@...gle.com>
To: Chris Mason <clm@...com>, "Paul E. McKenney" <paulmck@...nel.org>,
Giuseppe Scrivano <gscrivan@...hat.com>
Cc: linux-kernel@...r.kernel.org,
"riel@...riel.com" <riel@...riel.com>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
Christoph Bartoschek <bartoschek@...gle.com>
Subject: Re: [PATCH RFC fs/namespace] Make kern_unmount() use synchronize_rcu_expedited()
The regression that has been introduced with commit
e1eb26fa62d04ec0955432be1aa8722a97cb52e7 has hit us when building with Bazel
using the linux-sandbox
(https://github.com/bazelbuild/bazel/blob/master/src/main/tools/linux-sandbox.cc).
The sandbox tries to isolate build steps from each other and to ensure that
builds are hermetic and therefore sets up new namespaces for each step. For
large software packages and even with the time spend building we run out of
namespaces on larger machines that allow for enough parallelism. I have reduced
the sandbox to a simple test case:
#define _GNU_SOURCE
#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
int pid1main(void *) {
return 0;
}
int main(void) {
int clone_flags = CLONE_NEWUSER | CLONE_NEWIPC | SIGCHLD;
void * stack = malloc(1024*1024);
const pid_t child_pid = clone(pid1main, stack + 1024*1024, clone_flags, NULL);
if (child_pid < 0) {
perror("clone");
}
int ret = waitpid(child_pid, NULL, 0);
if (ret < 0) {
perror("waitpid");
return ret;
}
return 0;
}
Run it with
$ gcc clone-test.cc
$ seq 1 10000000 | parallel --halt now,fail=1 -j32 $PWD/a.out
clone: No space left on device
waitpid: No child processes
parallel: This job failed:
/usr/local/google/home/bartoschek/linux-sandbox-test/a.out 53070
I run the test on kernel v5.18-rc4.
Depending on your configured limits you will soon get an ENOSPC even though
never more than 32 additional namespaces should be in use by parallel.
During execution the whole system can become quite unresponsive.
This does not happen without e1eb26fa62d04ec0955432be1aa8722a97cb52e7.
I see that the issue was already reported in 2020:
http://merlin.infradead.org/pipermail/linux-nvme/2020-September/019565.html
Would it be possible to revert e1eb26fa62d04ec0955432be1aa8722a97cb52e7? It
seems to make the kernel less deterministic and hard to reason about active
namespaces.
Christoph
Powered by blists - more mailing lists