[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZdoEavHorDs3IlF5@tycho.pizza>
Date: Sat, 24 Feb 2024 07:59:54 -0700
From: Tycho Andersen <tycho@...ho.pizza>
To: Christian Brauner <brauner@...nel.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>,
stgraber@...raber.org, cyphar@...har.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 2/2] tests/pid_namespace: add pid_max tests
On Fri, Feb 23, 2024 at 05:24:03PM +0100, Christian Brauner wrote:
> On Thu, Feb 22, 2024 at 09:54:08AM -0700, Tycho Andersen wrote:
> > On Thu, Feb 22, 2024 at 05:09:15PM +0100, Alexander Mikhalitsyn wrote:
> > > +static int pid_max_nested_limit_inner(void *data)
> > > +{
> > > + int fret = -1, nr_procs = 400;
> > > + int fd, ret;
> > > + pid_t pid;
> > > + pid_t pids[1000];
> > > +
> > > + ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
> > > + if (ret) {
> > > + fprintf(stderr, "%m - Failed to make rootfs private mount\n");
> > > + return fret;
> > > + }
> > > +
> > > + umount2("/proc", MNT_DETACH);
> > > +
> > > + ret = mount("proc", "/proc", "proc", 0, NULL);
> > > + if (ret) {
> > > + fprintf(stderr, "%m - Failed to mount proc\n");
> > > + return fret;
> > > + }
> > > +
> > > + fd = open("/proc/sys/kernel/pid_max", O_RDWR | O_CLOEXEC | O_NOCTTY);
> > > + if (fd < 0) {
> > > + fprintf(stderr, "%m - Failed to open pid_max\n");
> > > + return fret;
> > > + }
> > > +
> > > + ret = write(fd, "500", sizeof("500") - 1);
> > > + close(fd);
> > > + if (ret < 0) {
> > > + fprintf(stderr, "%m - Failed to write pid_max\n");
> > > + return fret;
> > > + }
> > > +
> > > + for (nr_procs = 0; nr_procs < 500; nr_procs++) {
> > > + pid = fork();
> > > + if (pid < 0)
> > > + break;
> > > +
> > > + if (pid == 0)
> > > + exit(EXIT_SUCCESS);
> > > +
> > > + pids[nr_procs] = pid;
> > > + }
> > > +
> > > + if (nr_procs >= 400) {
> > > + fprintf(stderr, "Managed to create processes beyond the configured outer limit\n");
> > > + goto reap;
> > > + }
> >
> > A small quibble, but I wonder about the semantics here. "You can write
> > whatever you want to this file, but we'll ignore it sometimes" seems
> > weird to me. What if someone (CRIU) wants to spawn a pid numbered 450
> > in this case? I suppose they read pid_max first, they'll be able to
> > tell it's impossible and can exit(1), but returning E2BIG from write()
> > might be more useful.
>
> That's a good idea. But it's a bit tricky. The straightforward thing is
> to walk upwards through all ancestor pid namespaces and use the lowest
> pid_max value as the upper bound for the current pid namespace. This
> will guarantee that you get an error when you try to write a value that
> you would't be able to create. The same logic should probably apply to
> ns_last_pid as well.
>
> However, that still leaves cases where the current pid namespace writes
> a pid_max limit that is allowed (IOW, all ancestor pid namespaces are
> above that limit.). But then immediately afterwards an ancestor pid
> namespace lowers the pid_max limit. So you can always end up in a
> scenario like this.
I wonder if we can push edits down too? Or an render .effective file, like
cgroups, though I prefer just putting the right thing in pid_max.
Tycho
Powered by blists - more mailing lists