linux-ext4 - Re: looking for assistance with jbd2 (and other processes) hung trying to write to disk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201110194623.GC2951190@mit.edu>
Date:   Tue, 10 Nov 2020 14:46:23 -0500
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Chris Friesen <chris.friesen@...driver.com>
Cc:     Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org
Subject: Re: looking for assistance with jbd2 (and other processes) hung
 trying to write to disk

On Tue, Nov 10, 2020 at 09:57:39AM -0600, Chris Friesen wrote:
> No, there are quite a few of them.  I've included them below.  I agree, it's
> not clear who's holding the lock.  Is there a way to find that out?
> 
> Just to be sure, I'm looking for whoever has the BH_Lock bit set on the
> buffer_head "b_state" field, right?  I don't see any ownership field the way
> we have for mutexes.  Is there some way to find out who would have locked
> the buffer?

It's quite possible that the buffer was locked as part of doing I/O,
and we are just waiting for the I/O to complete.  An example of this
is in journal_submit_commit_record(), where we lock the buffer using
lock_buffer(), and then call submit_bh() to submit the buffer for I/O.
When the I/O is completed, the buffer head will be unlocked, and we
can check the buffer_uptodate flag to see if the I/O completed
successfully.  (See journal_wait_on_commit_record() for an example of
this.)

So the first thing I'd suggest doing is looking at the console output
or dmesg output from the crashdump to see if there are any clues in
terms of kernel messages from the device driver before things locked
up.  This could be as simple as the device falling off the bus, in
which case there might be some kernel error messages from the block
layer or device driver that would give some insight.

Good luck,

					- Ted