lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 26 Apr 2022 08:59:17 +0200
From:   Christoph Bartoschek <bartoschek@...gle.com>
To:     Chris Mason <clm@...com>, "Paul E. McKenney" <paulmck@...nel.org>,
        Giuseppe Scrivano <gscrivan@...hat.com>
Cc:     linux-kernel@...r.kernel.org,
        "riel@...riel.com" <riel@...riel.com>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        Christoph Bartoschek <bartoschek@...gle.com>
Subject: Re: [PATCH RFC fs/namespace] Make kern_unmount() use synchronize_rcu_expedited()

The regression that has been introduced with commit
e1eb26fa62d04ec0955432be1aa8722a97cb52e7 has hit us when building with Bazel
using the linux-sandbox
(https://github.com/bazelbuild/bazel/blob/master/src/main/tools/linux-sandbox.cc).
The sandbox tries to isolate build steps from each other and to ensure that
builds are hermetic and therefore sets up new namespaces for each step. For
large software packages and even with the time spend building we run out of
namespaces on larger machines that allow for enough parallelism. I have reduced
the sandbox to a simple test case:

#define _GNU_SOURCE
#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>

int pid1main(void *) {
   return 0;
}

int main(void) {
  int clone_flags = CLONE_NEWUSER | CLONE_NEWIPC | SIGCHLD;
  void * stack = malloc(1024*1024);
  const pid_t child_pid = clone(pid1main, stack + 1024*1024, clone_flags, NULL);

  if (child_pid < 0) {
    perror("clone");
  }
  int ret = waitpid(child_pid, NULL, 0);
  if (ret < 0) {
    perror("waitpid");
    return ret;
  }
  return 0;
}

Run it with
$ gcc clone-test.cc
$ seq 1 10000000 | parallel --halt now,fail=1 -j32 $PWD/a.out
clone: No space left on device
waitpid: No child processes
parallel: This job failed:
/usr/local/google/home/bartoschek/linux-sandbox-test/a.out 53070

I run the test on kernel v5.18-rc4.
Depending on your configured limits you will soon get an ENOSPC even though
never more than 32 additional namespaces should be in use by parallel.
During execution the whole system can become quite unresponsive.
This does not happen without e1eb26fa62d04ec0955432be1aa8722a97cb52e7.

I see that the issue was already reported in 2020:
http://merlin.infradead.org/pipermail/linux-nvme/2020-September/019565.html

Would it be possible to revert e1eb26fa62d04ec0955432be1aa8722a97cb52e7? It
seems to make the kernel less deterministic and hard to reason about active
namespaces.

Christoph

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ