linux-kernel - Re: [RESEND PATCH v6 0/5] proc: subset=pid: Relax check of mount visibility

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251224-glasbruch-mahnmal-ef7e9e10bceb@brauner>
Date: Wed, 24 Dec 2025 13:55:20 +0100
From: Christian Brauner <brauner@...nel.org>
To: Alexey Gladkov <legion@...nel.org>
Cc: Dan Klishch <danilklishch@...il.com>, 
	containers@...ts.linux-foundation.org, ebiederm@...ssion.com, keescook@...omium.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, viro@...iv.linux.org.uk
Subject: Re: [RESEND PATCH v6 0/5] proc: subset=pid: Relax check of mount
 visibility

On Mon, Dec 15, 2025 at 03:58:42PM +0100, Alexey Gladkov wrote:
> On Mon, Dec 15, 2025 at 09:46:00AM -0500, Dan Klishch wrote:
> > On 12/15/25 5:10 AM, Alexey Gladkov wrote:
> > > On Sun, Dec 14, 2025 at 01:02:54PM -0500, Dan Klishch wrote:
> > >> On 12/14/25 11:40 AM, Alexey Gladkov wrote:
> > >>> But then, if I understand you correctly, this patch will not be enough
> > >>> for you. procfs with subset=pid will not allow you to have /proc/meminfo,
> > >>> /proc/cpuinfo, etc.
> > >>
> > >> Hmm, I didn't think of this. sunwalker-box only exposes cpuinfo and PID
> > >> tree to the sandboxed programs (empirically, this is enough for most of
> > >> programs you want sandboxing for). With that in mind, this patch and a
> > >> FUSE providing an overlay with cpuinfo / seccomp intercepting opens of
> > >> /proc/cpuinfo / a small kernel patch with a new mount option for procfs
> > >> to expose more static files still look like a clean solution to me.
> > > 
> > > I don't think you'll be able to do that. procfs doesn't allow itself to
> > > be overlayed [1]. What should block mounting overlayfs and fuse on top
> > > of procfs.
> > > 
> > > [1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/root.c#n274
> > 
> > This is why I have been careful not to say overlayfs. With [2] (warning:
> > zero-shot ChatGPT output), I can do:
> > 
> > $ ./fuse-overlay target --source=/proc
> > $ ls target
> > 1   88   194   1374    889840  908552
> > 2   90   195   1375    889987  908619
> > 3   91   196   1379    890031  908658
> > 4   92   203   1412    890063  908756
> > 5   93   205   1590    890085  908804
> > 6   94   233   1644    890139  908951
> > 7   96   237   1802    890246  909848
> > 8   97   239   1850    890271  909914
> > 10  98   240   1852    894665  909924
> > 13  99   243   1865    895854  909926
> > 15  100  244   1888    895864  910005
> > 16  102  246   1889    896030  acpi
> > 17  103  262   1891    896205  asound
> > 18  104  263   1895    896508  bus
> > 19  105  264   1896    896544  driver
> > 20  106  265   1899    896706  dynamic_debug
> > <...>
> > 
> > [2] https://gist.github.com/DanShaders/547eeb74a90315356b98472feae47474
> > 
> > This requires a much more careful thought wrt magic symlinks
> > and permission checks. The fact that I am highly unlikely to 100%
> > correctly reimplement the checks and special behavior of procfs makes me
> > not want to proceed with the FUSE route.
> > 
> > On 12/15/25 6:30 AM, Christian Brauner wrote:
> > > The standard way of making it possible to mount procfs inside of a
> > > container with a separate mount namespace that has a procfs inside it
> > > with overmounted entries is to ensure that a fully-visible procfs
> > > instance is present.
> > 
> > Yes, this is a solution. However, this is only marginally better than
> > passing --privileged to the outer container (in a sense that we require
> > outer sandbox to remove some protections for the inner sandbox to work).
> > 
> > > The container needs to inherit a fully-visible instance somehow if you
> > > want nesting. Using an unprivileged LSM such as landlock to prevent any
> > > access to the fully visible procfs instance is usually the better way.
> > > 
> > > My hope is that once signed bpf is more widely adopted that distros will
> > > just start enabling blessed bpf programs that will just take on the
> > > access protecting instead of the clumsy bind-mount protection mechanism.
> > 
> > These are big changes to container runtimes that are unlikely to happen
> > soon. In contrast, the patch we are discussing will be available in 2
> > months after the merge for me to use on ArchLinux, and in a couple more
> > months on Ubuntu.
> > 
> > So, is there any way forward with the patch or should I continue trying
> > to find a userspace solution?
> 
> I still consider these patches useful. I made them precisely to remove
> some of the restrictions we have for procfs because of global files in
> the root of this filesystem.
> 
> I can update and prepare a new version of patchset if Christian thinks
> it's useful too.

Let's see it at least! No need to preemptively dismiss it. :)