lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bnxhqrq7tip6jl2hu6jsvxxogdfii7ugmafbhgsogovrchxfyp@kagotkztqurt>
Date: Thu, 30 Jan 2025 18:45:36 +0100
From: Michal Koutný <mkoutny@...e.com>
To: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>
Cc: brauner@...nel.org, stgraber@...raber.org, tycho@...ho.pizza, 
	cyphar@...har.com, yun.zhou@...driver.com, joel.granados@...nel.org, 
	rostedt@...dmis.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/2] pid_namespace: namespacify sysctl kernel.pid_max

Hello.

On Fri, Nov 22, 2024 at 02:24:57PM +0100, Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com> wrote:
>
(Sorry for responding only now as I missed this until I read v6.14 news.)

> The pid_max sysctl is a global value. For a long time the default value
> has been 65535 and during the pidfd dicussions Linus proposed to bump
> pid_max by default (cf. [1]). Based on this discussion systemd started
> bumping pid_max to 2^22. So all new systems now run with a very high
> pid_max limit with some distros having also backported that change.

Yes, multiple [1] people [2] proposed even lifting the legacy limit in
kernel directly.

> Of course, giving containers the ability to restrict the number of
> processes in their respective pid namespace indepent of the global limit
> through pid_max is something desirable in itself and comes in handy in
> general.

Yes, this is what pids.max of a cgroup (already) does.

(It is already difficult for users to troubleshoot which of multiple pid
limits restricts their workload. I'm afraid making pid_max
per-(hierarchical-)NS will contribute to confusion.)
Also, the implementation copies the limit upon creation from
parent, this pattern showed cumbersome with some attributes in legacy
cgroup controllers e.g.  it's subject to race condition between parent's
limit modification and children creation.

> Independent of motivating use-cases the existence of pid namespaces
> makes this also a good semantical extension and there have been prior
> proposals pushing in a similar direction.
> The trick here is to minimize the risk of regressions which I think is
> doable. The fact that pid namespaces are hierarchical will help us here.

I understand it is tempting to make pid_max part of a pid namespace but
was the overlap with pids controller considered?

I'd consider the alternative of relying of virtualized PID numbers in
pid namespaces with appropriate pids.max limit and numbers allocation
strategy that would keep PID values below the limit (i.e. taking the
first free pid number in given NS, actually I thought it is already the
case but it doesn't work like that (when I try now [3])).
WDYT?

TL;DR instead of getting rid of the legacy limit, it was further
extended to pid namespaces because of legacy workloads and it (almost)
duplicates existing mechanism. Can this be rethought please?

Thanks,
Michal

[1] https://lore.kernel.org/all/20240408145819.8787-1-mkoutny@suse.com/
[2] https://lore.kernel.org/linux-api/CAHk-=wiZ40LVjnXSi9iHLE_-ZBsWFGCgdmNiYZUXn1-V5YBg2g@mail.gmail.com/

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ