[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250716-unsolidarisch-sagst-e70630ddf6b7@brauner>
Date: Wed, 16 Jul 2025 10:31:53 +0200
From: Christian Brauner <brauner@...nel.org>
To: Song Liu <songliubraving@...a.com>
Cc: Paul Moore <paul@...l-moore.com>, Al Viro <viro@...iv.linux.org.uk>,
Song Liu <song@...nel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-security-module@...r.kernel.org" <linux-security-module@...r.kernel.org>, "apparmor@...ts.ubuntu.com" <apparmor@...ts.ubuntu.com>,
"selinux@...r.kernel.org" <selinux@...r.kernel.org>,
"tomoyo-users_en@...ts.sourceforge.net" <tomoyo-users_en@...ts.sourceforge.net>,
"tomoyo-users_ja@...ts.sourceforge.net" <tomoyo-users_ja@...ts.sourceforge.net>, Kernel Team <kernel-team@...a.com>,
"andrii@...nel.org" <andrii@...nel.org>, "eddyz87@...il.com" <eddyz87@...il.com>,
"ast@...nel.org" <ast@...nel.org>, "daniel@...earbox.net" <daniel@...earbox.net>,
"martin.lau@...ux.dev" <martin.lau@...ux.dev>, "jack@...e.cz" <jack@...e.cz>,
"kpsingh@...nel.org" <kpsingh@...nel.org>, "mattbobrowski@...gle.com" <mattbobrowski@...gle.com>,
"amir73il@...il.com" <amir73il@...il.com>, "repnop@...gle.com" <repnop@...gle.com>,
"jlayton@...nel.org" <jlayton@...nel.org>, "josef@...icpanda.com" <josef@...icpanda.com>,
"mic@...ikod.net" <mic@...ikod.net>, "gnoack@...gle.com" <gnoack@...gle.com>,
"m@...wtm.org" <m@...wtm.org>, "john.johansen@...onical.com" <john.johansen@...onical.com>,
"john@...armor.net" <john@...armor.net>,
"stephen.smalley.work@...il.com" <stephen.smalley.work@...il.com>, "omosnace@...hat.com" <omosnace@...hat.com>,
"takedakn@...data.co.jp" <takedakn@...data.co.jp>,
"penguin-kernel@...ove.sakura.ne.jp" <penguin-kernel@...ove.sakura.ne.jp>, "enlightened@...omium.org" <enlightened@...omium.org>
Subject: Re: [RFC] vfs: security: Parse dev_name before calling
security_sb_mount
On Tue, Jul 15, 2025 at 10:31:39PM +0000, Song Liu wrote:
>
> > On Jul 15, 2025, at 3:18 AM, Christian Brauner <brauner@...nel.org> wrote:
> > On Mon, Jul 14, 2025 at 03:10:57PM +0000, Song Liu wrote:
>
>
> [...]
>
> >>> If you place a new security hook into __do_loopback() the only thing
> >>> that I'm not excited about is that we're holding the global namespace
> >>> semaphore at that point. And I want to have as little LSM hook calls
> >>> under the namespace semaphore as possible.
> >>
> >> do_loopback() changed a bit since [1]. But if we put the new hook
> >> in do_loopback() before lock_mount(), we don’t have the problem with
> >> the namespace semaphore, right? Also, this RFC doesn’t seem to have
> >> this issue either.
> >
> > While the mount isn't locked another mount can still be mounted on top
> > of it. lock_mount() will detect this and lookup the topmost mount and
> > use that. IOW, the value of old_path->mnt may have changed after
> > lock_mount().
>
> I am probably confused. Do you mean path->mnt (instead of old_path->mnt)
> may have changed after lock_mount()?
I mean the target path. I forgot that the code uses @old_path to mean
the source path not the target path. And you're interested in the source
path, not the target path.
>
> > If you have 1000 containers each calling into
> >>> security_something_something_bind_mount() and then you do your "walk
> >>> upwards towards the root stuff" and that root is 100000 directories away
> >>> you've introduced a proper DOS or at least a severe new bottleneck into
> >>> the system. And because of mount namespace propagation that needs to be
> >>> serialized across all mount namespaces the namespace semaphore isn't
> >>> something we can just massage away.
> >>
> >> AFAICT, a poorly designed LSM can easily DoS a system. Therefore, I
> >> don’t think we need to overthink about a LSM helper causing DoS in
> >> some special scenarios. The owner of the LSM, either built-in LSM or
> >> BPF LSM, need to be aware of such risks and design the LSM rules
> >> properly to avoid DoS risks. For example, if the path tree is really
> >> deep, the LSM may decide to block the mount after walking a preset
> >> number of steps.
> >
> > The scope of the lock matters _a lot_. If a poorly designed LSM happens
> > to take exorbitant amount of time under the inode_lock() it's annoying:
> > to anyone else wanting to grab the inode_lock() _for that single inode_.
> >
> > If a poorly designed LSM does broken stuff under the namespace semaphore
> > any mount event on the whole system will block, effectively deadlocking
> > the system in an instant. For example, if anything even glances at
> > /proc/<pid>/mountinfo it's game over. It's already iffy that we allow
> > security_sb_statfs() under there but that's at least guaranteed to be
> > fast.
> >
> > If you can make it work so that we don't have to place security_*()
> > under the namespace semaphore and you can figure out how to deal with a
> > potential overmount racing you then this would be ideal for everyone.
>
> I am trying to understand all the challenges here.
As long as you're only interested in the source path's mount, you're
fine.
>
> It appears to me that do_loopback() has the tricky issue:
>
> static int do_loopback(struct path *path, ...)
> {
> ...
> /*
> * path may still change, so not a good point to add
> * security hook
> */
> mp = lock_mount(path);
> if (IS_ERR(mp)) {
> /* ... */
> }
> /*
> * namespace_sem is locked, so not a good point to add
> * security hook
> */
> ...
> }
>
> Basically, without major work with locking, there is no good
> spot to insert a security hook into do_loopback(). Or, maybe
> we can add a hook somewhere in lock_mount()?
You can't really because the lookup_mnt() call in lock_mount() happens
under the namespace semaphore already and if it's the topmost mount it
won't be dropped again and you can't drop it again without risking
overmounts again.
But again, as long as you are interested in the source mount you should
be fine.
Powered by blists - more mailing lists