[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 11 Nov 2020 10:24:35 -0600
From: Chris Friesen <chris.friesen@...driver.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org
Subject: Re: looking for assistance with jbd2 (and other processes) hung
trying to write to disk
On 11/11/2020 9:57 AM, Jan Kara wrote:
> On Tue 10-11-20 09:57:39, Chris Friesen wrote:
>> Just to be sure, I'm looking for whoever has the BH_Lock bit set on the
>> buffer_head "b_state" field, right? I don't see any ownership field the way
>> we have for mutexes. Is there some way to find out who would have locked
>> the buffer?
>
> Buffer lock is a bitlock so there's no owner field. If you can reproduce
> the problem at will and can use debug kernels, then it's easiest to add
> code to lock_buffer() (and fields to struct buffer_head) to track lock
> owner and then see who locked the buffer. Without this, the only way is to
> check stack traces of all UN processes and see whether some stacktrace
> looks suspicious like it could hold the buffer locked (e.g. recursing into
> memory allocation and reclaim while holding buffer locked or something like
> that)...
That's what I thought. :) Debug kernels are doable, but unfortunately
we can't (yet) reproduce the problem at will. Naturally it's only shown
up in a couple of customer sites so far and not in any test labs.
> As Ted wrote the buffer is indeed usually locked because the IO is running
> and that would be the expected situation with the jdb2 stacktrace you
> posted. So it could also be the IO got stuck somewhere in the block layer
> or NVME (frankly, AFAIR NVME was pretty rudimentary with 3.10). To see
> whether that's the case, you need to find 'bio' pointing to the buffer_head
> (through bi_private field), possibly also struct request for that bio and see
> what state they are in... Again, if you can run debug kernels, you can
> write code to simplify this search for you...
Thanks, that's helpful.
Chris
Powered by blists - more mailing lists