[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <acb859e1684122e1a73f30115f2389d2c9897251.camel@kernel.org>
Date: Mon, 19 Jan 2026 17:21:30 -0500
From: Jeff Layton <jlayton@...nel.org>
To: Andy Lutomirski <luto@...capital.net>, Askar Safin <safinaskar@...il.com>
Cc: brauner@...nel.org, amir73il@...il.com, cyphar@...har.com, jack@...e.cz,
josef@...icpanda.com, linux-fsdevel@...r.kernel.org,
viro@...iv.linux.org.uk, Lennart Poettering <mzxreary@...inter.de>, David
Howells <dhowells@...hat.com>, Zhang Yunkai <zhang.yunkai@....com.cn>,
cgel.zte@...il.com, Menglong Dong <menglong8.dong@...il.com>,
linux-kernel@...r.kernel.org, initramfs@...r.kernel.org,
containers@...ts.linux.dev, linux-api@...r.kernel.org, news@...ronix.com,
lwn@....net, Jonathan Corbet <corbet@....net>, Rob Landley
<rob@...dley.net>, emily@...coat.dev, Christoph Hellwig <hch@....de>
Subject: Re: [PATCH 0/2] mount: add OPEN_TREE_NAMESPACE
On Mon, 2026-01-19 at 11:05 -0800, Andy Lutomirski wrote:
> On Mon, Jan 19, 2026 at 10:56 AM Askar Safin <safinaskar@...il.com> wrote:
> >
> > Christian Brauner <brauner@...nel.org>:
> > > Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to
> > > OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of
> > > returning a file descriptor referring to that mount tree
> > > OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor
> > > to a new mount namespace. In that new mount namespace the copied mount
> > > tree has been mounted on top of a copy of the real rootfs.
> >
> > I want to point at security benefits of this.
> >
> > [[ TL;DR: [1] and [2] are very big changes to how mount namespaces work.
> > I like them, and I think they should get wider exposure. ]]
> >
> > If this patchset ([1]) and [2] both land (they are both in "next" now and
> > likely will be submitted to mainline soon) and "nullfs_rootfs" is passed on
> > command line, then mount namespace created by open_tree(OPEN_TREE_NAMESPACE) will
> > usually contain exactly 2 mounts: nullfs and whatever was passed to
> > open_tree(OPEN_TREE_NAMESPACE).
> >
> > This means that even if attacker somehow is able to unmount its root and
> > get access to underlying mounts, then the only underlying thing they will
> > get is nullfs.
> >
> > Also this means that other mounts are not only hidden in new namespace, they
> > are fully absent. This prevents attacks discussed here: [3], [4].
> >
> > Also this means that (assuming we have both [1] and [2] and "nullfs_rootfs"
> > is passed), there is no anymore hidden writable mount shared by all containers,
> > potentially available to attackers. This is concern raised in [5]:
> >
> > > You want rootfs to be a NULLFS instead of ramfs. You don't seem to want it to
> > > actually _be_ a filesystem. Even with your "fix", containers could communicate
> > > with each _other_ through it if it becomes accessible. If a container can get
> > > access to an empty initramfs and write into it, it can ask/answer the question
> > > "Are there any other containers on this machine running stux24" and then coordinate.
>
> I think this new OPEN_TREE_NAMESPACE is nifty, but I don't think the
> path that gives it sensible behavior should be conditional like this.
> Either make it *always* mount on top of nullfs (regardless of boot
> options) or find some way to have it actually be the root. I assume
> the latter is challenging for some reason.
>
I think that's the plan. I suggested the same to Christian last week,
and he was amenable to removing the option and just always doing a
nullfs_rootfs mount.
We think that older runtimes should still "just work" with this scheme.
Out of an abundance of caution, we _might_ want a command-line option
to make it go back to old way, in case we find some userland stuff that
doesn't like this for some reason, but hopefully we won't even need
that.
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists