linux-kernel - Re: [PATCH 2/2] nvme-fc: Wait with a timeout for queue to freeze

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YOQGRwLfLaFGqlVA@T590>
Date:   Tue, 6 Jul 2021 15:29:11 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Daniel Wagner <dwagner@...e.de>
Cc:     linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
        James Smart <james.smart@...adcom.com>,
        Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
        Sagi Grimberg <sagi@...mberg.me>
Subject: Re: [PATCH 2/2] nvme-fc: Wait with a timeout for queue to freeze

On Mon, Jul 05, 2021 at 06:34:00PM +0200, Daniel Wagner wrote:
> On Tue, Jun 29, 2021 at 09:39:30AM +0800, Ming Lei wrote:
> > Can you investigate a bit on why there is the hang? FC shouldn't use
> > managed IRQ, so the interrupt won't be shutdown.
> 
> So far, I was not able to figure out why this hangs. In my test setup I
> don't have to do any I/O, I just toggle the remote port.
> 
>   grep busy /sys/kernel/debug/block/*/hctx*/tags | grep -v busy=0
> 
> and this seems to confirm, no I/O in flight.

What is the output of the following command after the hang is triggered?

(cd /sys/kernel/debug/block/nvme0n1 && find . -type f -exec grep -aH . {} \;)

Suppose the hang disk is nvme0n1.

> 
> So I started to look at the q_usage_counter. The obvious observational
> is that counter is not 0. The least bit is set, thus we are in atomic
> mode. 
> 
> (gdb) p/x *((struct request_queue*)0xffff8ac992fbef20)->q_usage_counter->data
> $10 = {
>   count = {
>     counter = 0x8000000000000001
>   }, 
>   release = 0xffffffffa02e78b0, 
>   confirm_switch = 0x0, 
>   force_atomic = 0x0, 
>   allow_reinit = 0x1, 
>   rcu = {
>     next = 0x0, 
>     func = 0x0
>   }, 
>   ref = 0xffff8ac992fbef30
> }
> 
> I am a bit confused about the percpu-refcount API. My naive
> interpretation is that when we are in atomic mode percpu_ref_is_zero()
> can't be used. But this seems rather strange. I must miss something.

No, percpu_ref_is_zero() is fine to be called in atomic mode.


Thanks,
Ming