[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEivzxcVbEZtr+wPL1p+dM4r8+vFNnPoF+E-QvG_nLNHGDYJQg@mail.gmail.com>
Date: Thu, 29 Feb 2024 16:14:06 +0100
From: Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>
To: Tycho Andersen <tycho@...ho.pizza>
Cc: Christian Brauner <brauner@...nel.org>, stgraber@...raber.org, cyphar@...har.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 2/2] tests/pid_namespace: add pid_max tests
On Mon, Feb 26, 2024 at 4:30 PM Tycho Andersen <tycho@...ho.pizza> wrote:
>
> On Mon, Feb 26, 2024 at 09:57:47AM +0100, Christian Brauner wrote:
> > > > > A small quibble, but I wonder about the semantics here. "You can write
> > > > > whatever you want to this file, but we'll ignore it sometimes" seems
> > > > > weird to me. What if someone (CRIU) wants to spawn a pid numbered 450
> > > > > in this case? I suppose they read pid_max first, they'll be able to
> > > > > tell it's impossible and can exit(1), but returning E2BIG from write()
> > > > > might be more useful.
> > > >
> > > > That's a good idea. But it's a bit tricky. The straightforward thing is
> > > > to walk upwards through all ancestor pid namespaces and use the lowest
> > > > pid_max value as the upper bound for the current pid namespace. This
> > > > will guarantee that you get an error when you try to write a value that
> > > > you would't be able to create. The same logic should probably apply to
> > > > ns_last_pid as well.
> > > >
> > > > However, that still leaves cases where the current pid namespace writes
> > > > a pid_max limit that is allowed (IOW, all ancestor pid namespaces are
> > > > above that limit.). But then immediately afterwards an ancestor pid
> > > > namespace lowers the pid_max limit. So you can always end up in a
> > > > scenario like this.
> > >
> > > I wonder if we can push edits down too? Or an render .effective file, like
> >
> > I don't think that works in the current design? The pid_max value is per
> > struct pid_namespace. And while there is a 1:1 relationship between a
> > child pid namespace to all of its ancestor pid namespaces there's a 1 to
> > many relationship between a pid namespace and it's child pid namespaces.
> > IOW, if you change pid_max in pidns_level_1 then you'd have to go
> > through each of the child pid namespaces on pidns_level_2 which could be
> > thousands. So you could only do this lazily. IOW, compare and possibly
> > update the pid_max value of the child pid namespace everytime it's read
> > or written. Maybe that .effective is the way to go; not sure right now.
Hi Tycho!
>
> I wonder then, does it make sense to implement this as a cgroup thing
> instead, which is used to doing this kind of traversal?
>
> Or I suppose not, since the idea is to get legacy software that's
> writing to pid_max to work?
Yes, this is mostly for legacy software that expects host-like
behavior in the container.
I know that folks who work on running Android inside the container are
very-very interested in this.
Kind regards,
Alex
>
> Tycho
Powered by blists - more mailing lists