lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251215144600.911100-1-danilklishch@gmail.com>
Date: Mon, 15 Dec 2025 09:46:00 -0500
From: Dan Klishch <danilklishch@...il.com>
To: legion@...nel.org,
	brauner@...nel.org
Cc: containers@...ts.linux-foundation.org,
	ebiederm@...ssion.com,
	keescook@...omium.org,
	linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	viro@...iv.linux.org.uk
Subject: Re: [RESEND PATCH v6 0/5] proc: subset=pid: Relax check of mount visibility

On 12/15/25 5:10 AM, Alexey Gladkov wrote:
> On Sun, Dec 14, 2025 at 01:02:54PM -0500, Dan Klishch wrote:
>> On 12/14/25 11:40 AM, Alexey Gladkov wrote:
>>> But then, if I understand you correctly, this patch will not be enough
>>> for you. procfs with subset=pid will not allow you to have /proc/meminfo,
>>> /proc/cpuinfo, etc.
>>
>> Hmm, I didn't think of this. sunwalker-box only exposes cpuinfo and PID
>> tree to the sandboxed programs (empirically, this is enough for most of
>> programs you want sandboxing for). With that in mind, this patch and a
>> FUSE providing an overlay with cpuinfo / seccomp intercepting opens of
>> /proc/cpuinfo / a small kernel patch with a new mount option for procfs
>> to expose more static files still look like a clean solution to me.
> 
> I don't think you'll be able to do that. procfs doesn't allow itself to
> be overlayed [1]. What should block mounting overlayfs and fuse on top
> of procfs.
> 
> [1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/root.c#n274

This is why I have been careful not to say overlayfs. With [2] (warning:
zero-shot ChatGPT output), I can do:

$ ./fuse-overlay target --source=/proc
$ ls target
1   88   194   1374    889840  908552
2   90   195   1375    889987  908619
3   91   196   1379    890031  908658
4   92   203   1412    890063  908756
5   93   205   1590    890085  908804
6   94   233   1644    890139  908951
7   96   237   1802    890246  909848
8   97   239   1850    890271  909914
10  98   240   1852    894665  909924
13  99   243   1865    895854  909926
15  100  244   1888    895864  910005
16  102  246   1889    896030  acpi
17  103  262   1891    896205  asound
18  104  263   1895    896508  bus
19  105  264   1896    896544  driver
20  106  265   1899    896706  dynamic_debug
<...>

[2] https://gist.github.com/DanShaders/547eeb74a90315356b98472feae47474

This requires a much more careful thought wrt magic symlinks
and permission checks. The fact that I am highly unlikely to 100%
correctly reimplement the checks and special behavior of procfs makes me
not want to proceed with the FUSE route.

On 12/15/25 6:30 AM, Christian Brauner wrote:
> The standard way of making it possible to mount procfs inside of a
> container with a separate mount namespace that has a procfs inside it
> with overmounted entries is to ensure that a fully-visible procfs
> instance is present.

Yes, this is a solution. However, this is only marginally better than
passing --privileged to the outer container (in a sense that we require
outer sandbox to remove some protections for the inner sandbox to work).

> The container needs to inherit a fully-visible instance somehow if you
> want nesting. Using an unprivileged LSM such as landlock to prevent any
> access to the fully visible procfs instance is usually the better way.
> 
> My hope is that once signed bpf is more widely adopted that distros will
> just start enabling blessed bpf programs that will just take on the
> access protecting instead of the clumsy bind-mount protection mechanism.

These are big changes to container runtimes that are unlikely to happen
soon. In contrast, the patch we are discussing will be available in 2
months after the merge for me to use on ArchLinux, and in a couple more
months on Ubuntu.

So, is there any way forward with the patch or should I continue trying
to find a userspace solution?

Thanks,
Dan Klishch

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ