linux-kernel - Re: [syzbot] [xfs?] INFO: task hung in __fdget

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230904030233.GP3390869@ZenIV>
Date:   Mon, 4 Sep 2023 04:02:33 +0100
From:   Al Viro <viro@...iv.linux.org.uk>
To:     Dave Chinner <david@...morbit.com>
Cc:     Mateusz Guzik <mjguzik@...il.com>,
        syzbot <syzbot+e245f0516ee625aaa412@...kaller.appspotmail.com>,
        brauner@...nel.org, djwong@...nel.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-xfs@...r.kernel.org, llvm@...ts.linux.dev, nathan@...nel.org,
        ndesaulniers@...gle.com, syzkaller-bugs@...glegroups.com,
        trix@...hat.com
Subject: Re: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

On Mon, Sep 04, 2023 at 11:45:03AM +1000, Dave Chinner wrote:

> > thread B: write()
> > 	finds file
> > 	grabs ->f_pos_lock
> > 	calls into filesystem
> > 	blocks on fs lock held by A
> > thread C: read()/write()/lseek() on the same file
> > 	blocks on ->f_pos_lock
> 
> Yes, that's exactly what I said in a followup email - we need to
> know what happened to thread A, because that might be where we are
> stuck on a leaked lock.
> 
> I saw quite a few reports where lookup/readdir are also stuck trying
> to get an inode lock - those at the "thread B"s in the above example
> - but there's no indication left of what happened with thread A.
> 
> If thread A was blocked iall that time on something, then the hung
> task timer should fire on it, too.  If it is running in a tight
> loop, the NMI would have dumped a stack trace from it.
> 
> But neither of those things happened, so it's either leaked
> something or it's in a loop with a short term sleep so doesn't
> trigger the hung task timer. sysrq-w output will capture that
> without all the noise of sysrq-t....

Here's what brought sysrq-t:

| > The report does not have info necessary to figure this out -- no
| > backtrace for whichever thread which holds f_pos_lock. I clicked on a
| > bunch of other reports and it is the same story.
| > 
| > Can the kernel be configured to dump backtraces from *all* threads?
| > 
| > If there is no feature like that I can hack it up.
|
| <break>t
|
| over serial console, or echo t >/proc/sysrq-trigger would do it...

A question specifically about getting the stack traces...