linux-kernel - Re: [RFC PATCH] kernfs: release kernfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YZV3m1O6x3yQk/DZ@google.com>
Date:   Wed, 17 Nov 2021 13:43:55 -0800
From:   Minchan Kim <minchan@...nel.org>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
        hridya@...gle.com
Subject: Re: [RFC PATCH] kernfs: release kernfs_mutex before the inode
 allocation

On Wed, Nov 17, 2021 at 08:39:44AM +0100, Greg Kroah-Hartman wrote:
> On Tue, Nov 16, 2021 at 11:27:56PM -0800, Minchan Kim wrote:
> > On Wed, Nov 17, 2021 at 07:44:44AM +0100, Greg Kroah-Hartman wrote:
> > > On Tue, Nov 16, 2021 at 01:36:01PM -0800, Minchan Kim wrote:
> > > > On Tue, Nov 16, 2021 at 08:49:46PM +0100, Greg Kroah-Hartman wrote:
> > > > > On Tue, Nov 16, 2021 at 11:43:17AM -0800, Minchan Kim wrote:
> > > > > > The kernfs implementation has big lock granularity(kernfs_rwsem) so
> > > > > > every kernfs-based(e.g., sysfs, cgroup, dmabuf) fs are able to compete
> > > > > > the lock. Thus, if one of userspace goes the sleep under holding
> > > > > > the lock for a long time, rest of them should wait it. A example is
> > > > > > the holder goes direct reclaim with the lock since it needs memory
> > > > > > allocation. Let's fix it at common technique that release the lock
> > > > > > and then allocate the memory. Fortunately, kernfs looks like have
> > > > > > an refcount so I hope it's fine.
> > > > > > 
> > > > > > Signed-off-by: Minchan Kim <minchan@...nel.org>
> > > > > > ---
> > > > > >  fs/kernfs/dir.c             | 14 +++++++++++---
> > > > > >  fs/kernfs/inode.c           |  2 +-
> > > > > >  fs/kernfs/kernfs-internal.h |  1 +
> > > > > >  3 files changed, 13 insertions(+), 4 deletions(-)
> > > > > 
> > > > > What workload hits this lock to cause it to be noticable?
> > > > 
> > > > A app launching since it was dropping the frame since the
> > > > latency was too long.
> > > 
> > > How does running a program interact with kernfs filesystems?  Which
> > > one(s)?
> > 
> > A app launching involves dma_buf exports which creates kobject
> > and add it to the kernfs with down_write - kernfs_add_one.
> 
> I thought the "create a dma_buf kobject" kernel change was fixed up to
> not do that anymore as that was a known performance issue.
> 
> Creating kobjects should NOT be on a fast path, if they are, that needs
> to be fixed.

Ccing Hridya
I also already mentioned before but it's the as-is.
It would be great to be fixed but kernfs still has the problem
regardless of the dmabuf.

For example, process A hold the lock as read-side and try to
allocate memory and entered the reclaim. process B try to
hold the lock as writes-side(e.g., kernfs_iop_setattr) but
he should wait until process A completes. Next, processs C is
also stuck on the lock as read-side when it tries to access
sysfs since the process B spent a threahold timeout in rwsem.
Here, process C could be critical role to contribute the jank.
What it was doing is just access sysfs but was stuck.

> 
> > At the same time in other CPU, a random process was accessing
> > sysfs and the kernfs_iop_lookup was already hoding the kernfs_rwsem
> > and ran under direct reclaim patch due to alloc_inode in
> > kerfs_get_inode.
> 
> What program is constantly hitting sysfs?  sysfs is not for
> performance-critical things, right?

Constantly hitting sysfs itself is no problem since they need
to read telemetry data from sysfs and it's not latency sensitive.
Here, problem is the reading kernfs could make trouble others who
want to access once the rwsem is involved with exclusive lock mode.
Please look at above scenario.

> 
> > Therefore, the app is stuck on the lock and lose frames so enduser
> > sees the jank.
> 
> But how does this patch work around it?  It seems like you are
> special-casing the kobject creation path only.

It's true. If other path also has the memory allocation with holding
kernfs_rwsem, it could make trouble.

> 
> And is this the case really on 5.15?  I thought the kernfs locks were
> broken up again to not cause this problem in 5.14 or so.

It happened on 5.10 but the path I mentioned were still vulnerable path
with rwsem so it could happen on 5.15, too.