[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250715-knattern-hochklassig-ddc27ddd4557@brauner>
Date: Tue, 15 Jul 2025 12:18:00 +0200
From: Christian Brauner <brauner@...nel.org>
To: Song Liu <songliubraving@...a.com>
Cc: Paul Moore <paul@...l-moore.com>, Al Viro <viro@...iv.linux.org.uk>,
Song Liu <song@...nel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-security-module@...r.kernel.org" <linux-security-module@...r.kernel.org>, "apparmor@...ts.ubuntu.com" <apparmor@...ts.ubuntu.com>,
"selinux@...r.kernel.org" <selinux@...r.kernel.org>,
"tomoyo-users_en@...ts.sourceforge.net" <tomoyo-users_en@...ts.sourceforge.net>,
"tomoyo-users_ja@...ts.sourceforge.net" <tomoyo-users_ja@...ts.sourceforge.net>, Kernel Team <kernel-team@...a.com>,
"andrii@...nel.org" <andrii@...nel.org>, "eddyz87@...il.com" <eddyz87@...il.com>,
"ast@...nel.org" <ast@...nel.org>, "daniel@...earbox.net" <daniel@...earbox.net>,
"martin.lau@...ux.dev" <martin.lau@...ux.dev>, "jack@...e.cz" <jack@...e.cz>,
"kpsingh@...nel.org" <kpsingh@...nel.org>, "mattbobrowski@...gle.com" <mattbobrowski@...gle.com>,
"amir73il@...il.com" <amir73il@...il.com>, "repnop@...gle.com" <repnop@...gle.com>,
"jlayton@...nel.org" <jlayton@...nel.org>, "josef@...icpanda.com" <josef@...icpanda.com>,
"mic@...ikod.net" <mic@...ikod.net>, "gnoack@...gle.com" <gnoack@...gle.com>,
"m@...wtm.org" <m@...wtm.org>, "john.johansen@...onical.com" <john.johansen@...onical.com>,
"john@...armor.net" <john@...armor.net>,
"stephen.smalley.work@...il.com" <stephen.smalley.work@...il.com>, "omosnace@...hat.com" <omosnace@...hat.com>,
"takedakn@...data.co.jp" <takedakn@...data.co.jp>,
"penguin-kernel@...ove.sakura.ne.jp" <penguin-kernel@...ove.sakura.ne.jp>, "enlightened@...omium.org" <enlightened@...omium.org>
Subject: Re: [RFC] vfs: security: Parse dev_name before calling
security_sb_mount
On Mon, Jul 14, 2025 at 03:10:57PM +0000, Song Liu wrote:
>
>
> > On Jul 14, 2025, at 1:45 AM, Christian Brauner <brauner@...nel.org> wrote:
> >
> > On Fri, Jul 11, 2025 at 04:22:52PM +0000, Song Liu wrote:
> >>
> >>
> >>> On Jul 11, 2025, at 2:36 AM, Christian Brauner <brauner@...nel.org> wrote:
> >>
> >> [...]
> >>
> >>>>>
> >>>> To make sure I understand the comment. By “new mount api”, do you mean
> >>>> the code path under do_new_mount()?
> >>>
> >>> fsopen()
> >>> fsconfig()
> >>> fsmount()
> >>> open_tree()
> >>> open_tree_attr()
> >>> move_mount()
> >>> statmount()
> >>> listmount()
> >>>
> >>> I think that's all.
> >>
> >> Thanks for the clarification and pointer!
> >>
> >>>
> >>>>
> >>>>> My recommendation is make a list of all the currently supported
> >>>>> security_*() hooks in the mount code (I certainly don't have them in my
> >>>>> head). Figure out what each of them allow to mediate effectively and how
> >>>>> the callchains are related.
> >>>>>
> >>>>> Then make a proposal how to replace them with something that a) doesn't
> >>>>> cause regressions which is probably something that the LSMs care about
> >>>>> and b) that covers the new mount API sufficiently to be properly
> >>>>> mediated.
> >>>>>
> >>>>> I'll happily review proposals. Fwiw, I'm pretty sure that this is
> >>>>> something that Mickael is interested in as well.
> >>>>
> >>>> So we will consider a proper redesign of LSM hooks for mount syscalls,
> >>>> but we do not want incremental improvements like this one. Do I get
> >>>> the direction right?
> >>>
> >>> If incremental is workable then I think so yes. But it would be great to
> >>> get a consistent picture of what people want/need.
> >>
> >> In short term, we would like a way to get struct path of dev_name for
> >
> > You scared me for a second. By "dev_name" you mean the source path.
>
> Right, we need to get struct path for the source path specified by
> string “dev_name”.
>
> >
> >> bind mount. AFAICT, there are a few options:
> >>
> >> 1. Introduce bpf_kern_path kfunc.
> >> 2. Add new hook(s), such as [1].
> >> 3. Something like this patch.
> >>
> >> [1] https://lore.kernel.org/linux-security-module/20250110021008.2704246-1-enlightened@chromium.org/
> >>
> >> Do you think we can ship one of them?
> >
> > If you place a new security hook into __do_loopback() the only thing
> > that I'm not excited about is that we're holding the global namespace
> > semaphore at that point. And I want to have as little LSM hook calls
> > under the namespace semaphore as possible.
>
> do_loopback() changed a bit since [1]. But if we put the new hook
> in do_loopback() before lock_mount(), we don’t have the problem with
> the namespace semaphore, right? Also, this RFC doesn’t seem to have
> this issue either.
While the mount isn't locked another mount can still be mounted on top
of it. lock_mount() will detect this and lookup the topmost mount and
use that. IOW, the value of old_path->mnt may have changed after
lock_mount().
> > If you have 1000 containers each calling into
> > security_something_something_bind_mount() and then you do your "walk
> > upwards towards the root stuff" and that root is 100000 directories away
> > you've introduced a proper DOS or at least a severe new bottleneck into
> > the system. And because of mount namespace propagation that needs to be
> > serialized across all mount namespaces the namespace semaphore isn't
> > something we can just massage away.
>
> AFAICT, a poorly designed LSM can easily DoS a system. Therefore, I
> don’t think we need to overthink about a LSM helper causing DoS in
> some special scenarios. The owner of the LSM, either built-in LSM or
> BPF LSM, need to be aware of such risks and design the LSM rules
> properly to avoid DoS risks. For example, if the path tree is really
> deep, the LSM may decide to block the mount after walking a preset
> number of steps.
The scope of the lock matters _a lot_. If a poorly designed LSM happens
to take exorbitant amount of time under the inode_lock() it's annoying:
to anyone else wanting to grab the inode_lock() _for that single inode_.
If a poorly designed LSM does broken stuff under the namespace semaphore
any mount event on the whole system will block, effectively deadlocking
the system in an instant. For example, if anything even glances at
/proc/<pid>/mountinfo it's game over. It's already iffy that we allow
security_sb_statfs() under there but that's at least guaranteed to be
fast.
If you can make it work so that we don't have to place security_*()
under the namespace semaphore and you can figure out how to deal with a
potential overmount racing you then this would be ideal for everyone.
Powered by blists - more mailing lists