linux-kernel - WARNING: CPU: 15 PID: 0 at block/blk-mq.c:1111 __blk_mq_run_hw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1502906852.3305.33.camel@abdul.in.ibm.com>
Date:   Wed, 16 Aug 2017 23:37:32 +0530
From:   Abdul Haleem <abdhalee@...ux.vnet.ibm.com>
To:     linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        linux-block <linux-block@...r.kernel.org>,
        sachinp <sachinp@...ux.vnet.ibm.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Jens Axboe <axboe@...nel.dk>
Subject: WARNING: CPU: 15 PID: 0 at block/blk-mq.c:1111
 __blk_mq_run_hw_queue+0x1d8/0x1f0

Hi,

Linux-next booted with the below warnings on powerpc

Test: Reboot
Machine Type : Power 8 bare-metal
Kernel version : 4.13.0-rc4-next-20170808
gcc : 4.8.5
config: Tul-NV-config file attached
Issue is rare to hit (found once for 3 retries)

A WARN_ON_ONCE is being triggered from function __blk_mq_run_hw_queue in
file block/blk-mq.c at line 1111

which is : 

static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
{
        int srcu_idx;

        /*
         * We should be running this queue from one of the CPUs that
         * are mapped to it.
         */
        WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)
&&
                cpu_online(hctx->next_cpu));

        /*
         * We can't run the queue inline with ints disabled. Ensure that
         * we catch bad users of this early.
         */
   >>>  WARN_ON_ONCE(in_interrupt());

        if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
                rcu_read_lock();

boot warnings:
--------------
kvm: exiting hardware virtualization
------------[ cut here ]------------
WARNING: CPU: 15 PID: 0 at block/blk-mq.c:1111 __blk_mq_run_hw_queue
+0x1d8/0x1f0
Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge
stp llc kvm_hv kvm iptable_filter vmx_crypto ipmi_powernv leds_powernv
led_class powernv_rng ipmi_devintf ipmi_msghandler rng_core
powernv_op_panel binfmt_misc nfsd ip_tables x_tables autofs4
CPU: 15 PID: 0 Comm: swapper/15 Not tainted 4.13.0-rc4-next-20170808 #4
task: c0000007f8439000 task.stack: c0000007f84e0000
NIP: c00000000088f4d8 LR: c00000000088f7b0 CTR: c000000000dafcc0
REGS: c000000ffff376d0 TRAP: 0700   Not tainted
(4.13.0-rc4-next-20170808)
MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>
  CR: 42004022  XER: 00000000
CFAR: c00000000088f3c4 SOFTE: 1
GPR00: c00000000088f7b0 c000000ffff37950 c000000001d8ff00
c0000007eb707c00
GPR04: 0000000000000000 0000000000000000 c0000000022fff00
c00000000224ff00
GPR08: c00000000224ff00 0000000000000001 0000000000000100
9000000000001003
GPR12: 0000000000004400 c00000000fd45280 c0000007f84e3f90
0000000000200042
GPR16: 0000000100009ad1 c000000ffff34000 0000000000000000
c0000000015c4e80
GPR20: c000000001dc3b00 c0000000015c4e80 000000000000000a
c000000ffff34000
GPR24: 0000000000000000 c0000007eb3e1818 c000000ffff37a70
0000000000000001
GPR28: c0000007eb3e0000 0000000000000000 c0000007eb707c00
c0000007eb707c00
NIP [c00000000088f4d8] __blk_mq_run_hw_queue+0x1d8/0x1f0
LR [c00000000088f7b0] __blk_mq_delay_run_hw_queue+0x1f0/0x210
Call Trace:
[c000000ffff37990] [c00000000088f7b0] __blk_mq_delay_run_hw_queue
+0x1f0/0x210
[c000000ffff379d0] [c00000000088fcb8] blk_mq_start_hw_queue+0x58/0x80
[c000000ffff379f0] [c00000000088fd40] blk_mq_start_hw_queues+0x60/0xb0
[c000000ffff37a30] [c000000000ae2b54] scsi_kick_queue+0x34/0xa0
[c000000ffff37a50] [c000000000ae2f70] scsi_run_queue+0x3b0/0x660
[c000000ffff37ac0] [c000000000ae7ed4] scsi_run_host_queues+0x64/0xc0
[c000000ffff37b00] [c000000000ae7f64] scsi_unblock_requests+0x34/0x60
[c000000ffff37b20] [c000000000b14998] ipr_ioa_bringdown_done+0xf8/0x3a0
[c000000ffff37bc0] [c000000000b12528] ipr_reset_ioa_job+0xd8/0x170
[c000000ffff37c00] [c000000000b18790] ipr_reset_timer_done+0x110/0x160
[c000000ffff37c50] [c00000000024db50] call_timer_fn+0xa0/0x3a0
[c000000ffff37ce0] [c00000000024e058] expire_timers+0x1b8/0x350
[c000000ffff37d50] [c00000000024e2f0] run_timer_softirq+0x100/0x3e0
[c000000ffff37df0] [c000000000162edc] __do_softirq+0x20c/0x620
[c000000ffff37ee0] [c000000000163a80] irq_exit+0x230/0x290
[c000000ffff37f10] [c00000000001d770] __do_irq+0x170/0x410
[c000000ffff37f90] [c00000000003ea20] call_do_irq+0x14/0x24
[c0000007f84e3a70] [c00000000001dae0] do_IRQ+0xd0/0x190
[c0000007f84e3ac0] [c000000000008c58] hardware_interrupt_common
+0x158/0x160
--- interrupt: 501 at .L1^B42+0x0/0x4
    LR = arch_local_irq_restore+0x124/0x160
[c0000007f84e3db0] [c00000000001c9c8] arch_local_irq_restore+0xa8/0x160
(unreliable)
[c0000007f84e3dd0] [c000000000db5038] cpuidle_enter_state+0x238/0x6e0
[c0000007f84e3e30] [c000000000db5588] cpuidle_enter+0x38/0x60
[c0000007f84e3e50] [c0000000001f22e4] call_cpuidle+0x74/0xe0
[c0000007f84e3e70] [c0000000001f2a78] do_idle+0x4b8/0x5a0
[c0000007f84e3ee0] [c0000000001f2f64] cpu_startup_entry+0x74/0x90
[c0000007f84e3f20] [c000000000068c14] start_secondary+0x4a4/0x550
[c0000007f84e3f90] [c00000000000b16c] start_secondary_prolog+0x10/0x14
Instruction dump:
e9280e58 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 39290001 f9280e58 4e800020 
3ce2004c e9270e38 39290001 f9270e38 <0fe00000> 3d02004c e9280e40
39290001
---[ end trace 5632db71d3bf5b30 ]---

WARN_ON_ONCE(in_interrupt()) was first introduced in the commit :

commit b7a71e66d4d274d627cabc17c5e41330bcf47c2d
Author: Jens Axboe <axboe@...nel.dk>
Date:   Tue Aug 1 09:28:24 2017 -0600

    blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabled
    
    We recently had a bug in the IPR SCSI driver, where it would end up
    making the SCSI mid layer run the mq hardware queue with interrupts
    disabled. This isn't legal, since the software queue locking relies
    on never being grabbed from interrupt context. Additionally, drivers
    that set BLK_MQ_F_BLOCKING may schedule from this context.
    
    Add a WARN_ON_ONCE() to catch bad users up front.
    
    Signed-off-by: Jens Axboe <axboe@...nel.dk>

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre



View attachment "Tul-NV-config" of type "text/plain" (91816 bytes)