linux-kernel - Re: 4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <4f039923-19cb-4268-c897-145d5d9f46d2@de.ibm.com>
Date:   Tue, 27 Mar 2018 17:22:32 +0200
From:   Christian Borntraeger <borntraeger@...ibm.com>
To:     Ming Lei <ming.lei@...hat.com>,
        Stefan Haberland <sth@...ux.vnet.ibm.com>
Cc:     Jens Axboe <axboe@...nel.dk>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        Christoph Hellwig <hch@....de>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>
Subject: Re: 4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400
 __blk_mq_delay_run_hw_queue



On 03/27/2018 02:01 PM, Ming Lei wrote:
> Hi Stefan,
> 
> On Tue, Mar 27, 2018 at 12:04:20PM +0200, Stefan Haberland wrote:
>> Hi,
>>
>> I get the following warning in __blk_mq_delay_run_hw_queue when the
>> scheduler is set to mq-deadline for DASD devices on s390.
>>
>> What I see is that for whatever reason there is a hctx nr 0 which has no
>> hctx->tags pointer set.
>> From my observation it is always hctx nr 0 which has a tags NULL pointer in
>> it and I see other hctx which have the hctx->tags pointer set correctly.
>>
>>
>> [    2.169986] WARNING: CPU: 0 PID: 0 at block/blk-mq.c:1402
>> __blk_mq_delay_run_hw_queue+0xe8/0x118
>> [    2.170007] Modules linked in:
>> [    2.170014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
>> 4.16.0-rc7-04107-g91a05d9e1d6b-dirty #147
>> [    2.170019] Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
>> [    2.170024] Krnl PSW : 0000000076fd6c7f 00000000c244c24d
>> (__blk_mq_delay_run_hw_queue+0xe8/0x118)
>> [    2.170035]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0
>> RI:0 EA:3
>> [    2.170041] Krnl GPRS: 00000000599ec58a 0000000002a94000 0000000002a94000
>> 0000000000000001
>> [    2.170047]            0000000000000000 000000006e761e98 0000000002a96980
>> 0000000002a96800
>> [    2.170052]            0000000002d87ce0 00000000737fbda8 0000000000000000
>> 0000000000000001
>> [    2.170058]            0000000000000001 0000000000aedd10 00000000737fbc38
>> 00000000737fbc00
>> [    2.170069] Krnl Code: 00000000006ea3c8: ebaff0a00004 lmg    
>> %r10,%r15,160(%r15)
>>                           00000000006ea3ce: c0f4ffff5e0d brcl    15,6d5fe8
>>                          #00000000006ea3d4: a7f40001 brc     15,6ea3d6
>>                          >00000000006ea3d8: e340f0c00004        lg     
>> %r4,192(%r15)
>>                           00000000006ea3de: ebaff0a00004 lmg    
>> %r10,%r15,160(%r15)
>>                           00000000006ea3e4: 07f4 bcr     15,%r4
>>                           00000000006ea3e6: 41b01100 la      %r11,256(%r1)
>>                           00000000006ea3ea: 182a lr      %r2,%r10
>> [    2.170158] Call Trace:
>> [    2.170205] ([<0000000002a96800>] 0x2a96800)
>> [    2.170248]  [<00000000006ea4c0>] blk_mq_run_hw_queue+0xa0/0x100
>> [    2.170262]  [<00000000006ea59c>] blk_mq_run_hw_queues+0x7c/0x98
>> [    2.170295]  [<00000000006e88f6>] __blk_mq_complete_request+0x10e/0x1e0
>> [    2.170300]  [<00000000006e9e30>] blk_mq_complete_request+0x80/0xa0
>> [    2.170307]  [<000000000087fad0>] dasd_block_tasklet+0x218/0x480
>> [    2.170415]  [<000000000017c3f8>] tasklet_hi_action+0xa0/0x138
>> [    2.170434]  [<0000000000a91c10>] __do_softirq+0xc8/0x540
>> [    2.170471]  [<000000000017bd4e>] irq_exit+0x136/0x140
>> [    2.170478]  [<000000000010c912>] do_IRQ+0x8a/0xb8
>> [    2.170518]  [<0000000000a90ee0>] io_int_handler+0x138/0x2e0
>> [    2.170524]  [<0000000000102cd0>] enabled_wait+0x58/0x128
>> [    2.170562] ([<0000000000102cb8>] enabled_wait+0x40/0x128)
>> [    2.170577]  [<000000000010319a>] arch_cpu_idle+0x32/0x48
>> [    2.170604]  [<0000000000a8f636>] default_idle_call+0x3e/0x58
>> [    2.170613]  [<00000000001cd5d2>] do_idle+0xda/0x190
>> [    2.170621]  [<00000000001cd93e>] cpu_startup_entry+0x3e/0x48
>> [    2.170633]  [<0000000000e5ebf4>] start_kernel+0x47c/0x490
>> [    2.170641]  [<0000000000100020>] _stext+0x20/0x80
>> [    2.170650] 2 locks held by swapper/0/0:
>> [    2.170658]  #0:  (&(&dq->lock)->rlock){..-.}, at: [<00000000b45eaf9e>]
>> dasd_block_tasklet+0x1cc/0x480
>> [    2.170676]  #1:  (rcu_read_lock){....}, at: [<00000000bc7fa045>]
>> hctx_lock+0x34/0x110
>> [    2.170750] Last Breaking-Event-Address:
>> [    2.170758]  [<00000000006ea3d4>] __blk_mq_delay_run_hw_queue+0xe4/0x118
>> [    2.170803] ---[ end trace 1073cf0de1fd32d0 ]---
> 
> This warning is harmless, please try the following patch:
> 
> --
> From 7b2b5139bfef80f44d1b1424e09ab35b715fbfdb Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@...hat.com>
> Date: Tue, 27 Mar 2018 19:54:23 +0800
> Subject: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
> 
> From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
> with each possisble CPU") on, it should be easier to see unmapped hctx
> in some CPU topo, such as, hctx may not be mapped to any CPU.
> 
> This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
> checking if the hctx is mapped in blk_mq_run_hw_queues().
> 
> Reported-by: Stefan Haberland <sth@...ux.vnet.ibm.com>
> Cc: Christoph Hellwig <hch@....de>
> Fixes: 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
> Signed-off-by: Ming Lei <ming.lei@...hat.com>

Can we get this into 4.16?

> ---
>  block/blk-mq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 16e83e6df404..48f25a63833b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1459,7 +1459,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
>  	int i;
> 
>  	queue_for_each_hw_ctx(q, hctx, i) {
> -		if (blk_mq_hctx_stopped(hctx))
> +		if (blk_mq_hctx_stopped(hctx) || !blk_mq_hw_queue_mapped(hctx))
>  			continue;
> 
>  		blk_mq_run_hw_queue(hctx, async);
>