linux-kernel - Re: [PATCH v1 2/2] tests/pid_namespace: add pid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zdyumw6OfWBqQMTj@tycho.pizza>
Date: Mon, 26 Feb 2024 08:30:35 -0700
From: Tycho Andersen <tycho@...ho.pizza>
To: Christian Brauner <brauner@...nel.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>,
	stgraber@...raber.org, cyphar@...har.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 2/2] tests/pid_namespace: add pid_max tests

On Mon, Feb 26, 2024 at 09:57:47AM +0100, Christian Brauner wrote:
> > > > A small quibble, but I wonder about the semantics here. "You can write
> > > > whatever you want to this file, but we'll ignore it sometimes" seems
> > > > weird to me. What if someone (CRIU) wants to spawn a pid numbered 450
> > > > in this case? I suppose they read pid_max first, they'll be able to
> > > > tell it's impossible and can exit(1), but returning E2BIG from write()
> > > > might be more useful.
> > > 
> > > That's a good idea. But it's a bit tricky. The straightforward thing is
> > > to walk upwards through all ancestor pid namespaces and use the lowest
> > > pid_max value as the upper bound for the current pid namespace. This
> > > will guarantee that you get an error when you try to write a value that
> > > you would't be able to create. The same logic should probably apply to
> > > ns_last_pid as well.
> > > 
> > > However, that still leaves cases where the current pid namespace writes
> > > a pid_max limit that is allowed (IOW, all ancestor pid namespaces are
> > > above that limit.). But then immediately afterwards an ancestor pid
> > > namespace lowers the pid_max limit. So you can always end up in a
> > > scenario like this.
> > 
> > I wonder if we can push edits down too? Or an render .effective file, like
> 
> I don't think that works in the current design? The pid_max value is per
> struct pid_namespace. And while there is a 1:1 relationship between a
> child pid namespace to all of its ancestor pid namespaces there's a 1 to
> many relationship between a pid namespace and it's child pid namespaces.
> IOW, if you change pid_max in pidns_level_1 then you'd have to go
> through each of the child pid namespaces on pidns_level_2 which could be
> thousands. So you could only do this lazily. IOW, compare and possibly
> update the pid_max value of the child pid namespace everytime it's read
> or written. Maybe that .effective is the way to go; not sure right now.

I wonder then, does it make sense to implement this as a cgroup thing
instead, which is used to doing this kind of traversal?

Or I suppose not, since the idea is to get legacy software that's
writing to pid_max to work?

Tycho