linux-kernel - PID namespace init releases its file locks before its children die

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <58ac5d49-14a9-4fe6-a5a4-746d6b73f82b@gmail.com>
Date: Thu, 2 Oct 2025 14:22:30 -0400
From: Demi Marie Obenour <demiobenour@...il.com>
To: Linux kernel mailing list <linux-kernel@...r.kernel.org>,
 Andrew Morton <akpm@...ux-foundation.org>
Subject: PID namespace init releases its file locks before its children die

I noticed that PID 1 in a PID namespace can release file locks (due
to exiting) while its children are still running for a bit.  If the
locks held by PID 1 were relied to serialize the execution of its
child processes, this could result in data corruption.

Specifically, the child processes are killed via exit_notify() ->
forget_original_parent() -> find_child_reaper() ->
zap_pid_ns_processes().  That comes *after* exit_files(), which
releases the file locks.

While it is possible to implement this with cgroups, cgroups
are quite a bit more complicated to use, at least compared to
a single call to unshare() before fork().

Is this intentional?  Changing the behavior would make supervision
trees significantly easier to properly implement.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

Download attachment "OpenPGP_0xB288B55FFF9C22C1.asc" of type "application/pgp-keys" (7141 bytes)

Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (834 bytes)