linux-kernel - Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOQ4uxiAvWbEGavQuukzTf9JFkMKRL7T_1t8-YHUpYPuifmyHg@mail.gmail.com>
Date: Fri, 5 Apr 2024 13:34:11 +0300
From: Amir Goldstein <amir73il@...il.com>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: syzbot <syzbot+9a5b0ced8b1bfb238b56@...kaller.appspotmail.com>, 
	gregkh@...uxfoundation.org, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com, tj@...nel.org, 
	valesini@...dex-team.ru, Christoph Hellwig <hch@....de>, 
	Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, Miklos Szeredi <miklos@...redi.hu>
Subject: Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek

On Fri, Apr 5, 2024 at 1:01 AM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
>
> > This specifically cannot happen because sysfs is not allowed as an
> > upper layer only as a lower layer, so overlayfs itself will not be writing to
> > /sys/power/resume.
>
> Then how could you possibly get a deadlock there?  What would your minimal
> deadlocked set look like?
>
> 1.  Something is blocked in lookup_bdev() called from resume_store(), called
> from sysfs_kf_write(), called from kernfs_write_iter(), which has acquired
> ->mutex of struct kernfs_open_file that had been allocated by
> kernfs_fop_open() back when the file had been opened.  Note that each
> struct file instance gets a separate struct kernfs_open_file.  Since we are
> calling ->write_iter(), the file *MUST* have been opened for write.
>
> 2.  Something is blocked in kernfs_fop_llseek() on the same of->mutex,
> i.e. using the same struct file as (1).  That something is holding an
> overlayfs inode lock, which is what the next thread is blocked on.
>
> + at least one more thread, to complete the cycle.
>
> Right?  How could that possibly happen without overlayfs opening /sys/power/resume
> for write?  Again, each struct file instance gets a separate of->mutex;
> for a deadlock you need a cycle of threads and a cycle of locks, such
> that each thread is holding the corresponding lock and is blocked on
> attempt to get the lock that comes next in the cyclic order.

Absolutely right.
I had it in my mind that this was a node lock. Did not look closely.

>
> If overlayfs never writes to that sucker, it can't participate in that
> cycle.  Sure, you can get overlayfs llseek grabbing of->mutex of *ANOTHER*
> struct file opened for the same sysfs file.  Since it's not the same
> struct file and since each struct file there gets a separate kernfs_open_file
> instance, the mutex won't be the same.
>
> Unless I'm missing something else, that can't deadlock.  For a quick and
> dirty experiment, try to give of->mutex on r/o opens a class separate from
> that on r/w and w/o opens (mutex_init() in kernfs_fop_open()) and see
> if lockdep warnings persist.
>
> Something like
>
>         if (has_mmap)
>                 mutex_init(&of->mutex);
>         else if (file->f_mode & FMODE_WRITE)
>                 mutex_init(&of->mutex);
>         else
>                 mutex_init(&of->mutex);

Why a quick experiment?
Why not a permanent kludge?

It is not any better or worse than the already existing has_mmap
subclass annotation. huh?

Thanks,
Amir.