[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ad4mel7m2tfybp54vqfl5c6sownjr5kq3xa5ytucfkqecfakga@aw65fx3rziyj>
Date: Fri, 28 Feb 2025 14:46:08 +0100
From: Michal Koutný <mkoutny@...e.com>
To: Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>
Cc: brauner@...nel.org, stgraber@...raber.org, tycho@...ho.pizza,
cyphar@...har.com, yun.zhou@...driver.com, joel.granados@...nel.org,
rostedt@...dmis.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/2] pid_namespace: namespacify sysctl kernel.pid_max
Hello Aleksandr.
On Tue, Feb 25, 2025 at 07:01:21PM +0100, Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com> wrote:
> We see some kernel global limit or setting and consider if it's safe
> to be namespaced in some way
> and if it is safe and if it makes sense then we do it.
I know there are ucounts for various per-userns limits (NB RLIMIT_NPROC
among them).
Do you have any other precedents in mind?
In my thinking (biased towards raw resources, not ucounts) it's composed
like one global limit + cgroup limits for non-root groups, hence the
surprise with pid_max granularity.
> Second reason for having this is that we have a real use case scenario
> with 32-bit Android Bionic libc
> where we need to set a limit for PID *value*. And here, unfortunately,
> pids controller does not help either.
(I think if there were no pids controller, namespaced pid_max would be
very good approach how to implement this. But it sounds a little bit
redundant after pids controller was conceived.)
pid namespaces are definitely good place to tackle this since they do
pid numbers virtualization afterall. The challenge is how to limit the
number (amount) and number (pid) of tasks.
Note that besides the pids controller, pid_max and RLIMIT_NPROC, there's
also threads-max limit. Namespacing pid_max makes configuration space
even more complex :-/ In contrast with pids.max, there's no external
visibility of the namespace's pid_max (you must nsenter it) and pid_max
failures are more difficult to troubleshoot (mere failed fork(2)).
Admiteddly, I'm slightly hesitant to pursue the pids controller based
approach due to ns_last_pid. (Also how is that with starting those 32b
apps? Do they themselves adjust the limits inside the pidns or is this
done by some launcher (who may need privileges to set pids.max)?)
One more idea I have, would be to rebase my original pid_max default
value elimination [1] on top of the namespaced pid_max and not to copy
from parent but start unlimited in the ns too. (Or keep global default
value and unlimit only descednants so that's similar semantics to
ucounts.)
> I hope I explained above why I believe that this does not duplicate an
> existing mechanism.
The 32b scenario is certainly a sensible thing to resolve. But I'm still
worried people would start adjusting both of those and (presumably
different) people would run into unexpected fork failures.
Thanks,
Michal
[1] https://lore.kernel.org/all/20240408145819.8787-1-mkoutny@suse.com/
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists