lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZPUIQzsCSNlnBFHB@dread.disaster.area>
Date:   Mon, 4 Sep 2023 08:27:15 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Mateusz Guzik <mjguzik@...il.com>
Cc:     syzbot <syzbot+e245f0516ee625aaa412@...kaller.appspotmail.com>,
        brauner@...nel.org, djwong@...nel.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-xfs@...r.kernel.org, llvm@...ts.linux.dev, nathan@...nel.org,
        ndesaulniers@...gle.com, syzkaller-bugs@...glegroups.com,
        trix@...hat.com, viro@...iv.linux.org.uk
Subject: Re: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

On Sun, Sep 03, 2023 at 10:33:57AM +0200, Mateusz Guzik wrote:
> On Sun, Sep 03, 2023 at 03:25:28PM +1000, Dave Chinner wrote:
> > On Sat, Sep 02, 2023 at 09:11:34PM -0700, syzbot wrote:
> > > Hello,
> > > 
> > > syzbot found the following issue on:
> > > 
> > > HEAD commit:    b97d64c72259 Merge tag '6.6-rc-smb3-client-fixes-part1' of..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=14136d8fa80000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=958c1fdc38118172
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412
> > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > 
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > 
> > Been happening for months, apparently, yet for some reason it now
> > thinks a locking hang in __fdget_pos() is an XFS issue?
> > 
> > #syz set subsystems: fs
> > 
> 
> The report does not have info necessary to figure this out -- no
> backtrace for whichever thread which holds f_pos_lock. I clicked on a
> bunch of other reports and it is the same story.

That's true, but there's nothing that points at XFS in *any* of the
bug reports. Indeed, log from the most recent report doesn't have
any of the output from the time stuff hung. i.e. the log starts
at kernel time 669.487771 seconds, and the hung task report is at:

684.588608][   T28] INFO: task syz-executor.0:19830 blocked for more than 143 seconds

About 25 seconds later. So the hung tasks were running at about
540s, and that's just not in the logs.

Every report has a different combination of filesystems being
exercised, and a couple of them didn't even have XFS in them.

So at this point, there is no single filesystem that the reports
actually indicate is the cause, the reports don't contain the actual
operations that hung, and there's basically nothing to go on so far.
Hence putting it in the "fs" bucket (which encompasses all things
filesystems!) is the rigth thing to do.

The only commonality I kinda see is that secondary processes that
are hung seem mostly to be in directory operations waiting on inode
locks - either lookup or readdir, so it's entirely possible that a
filesystem has screwed up ->iterate_shared locking in some way...

> Can the kernel be configured to dump backtraces from *all* threads?

It already is (sysrq-t), but I'm not sure that will help - if it is
a leaked unlock then nothing will show up at all.

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ