linux-kernel - Re: panic with CPU hotplug + blk-mq + scsi-mq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACVXFVN_NjC814BqjmAb48do2qMy9EjQ+n3OD2nLz8AngopJXg@mail.gmail.com>
Date:	Sat, 18 Apr 2015 00:23:10 -0400
From:	Ming Lei <ming.lei@...onical.com>
To:	Dongsu Park <dongsu.park@...fitbricks.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: panic with CPU hotplug + blk-mq + scsi-mq

Hi Dongsu,

On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park
<dongsu.park@...fitbricks.com> wrote:
> Hi,
>
> there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq.
> Every time when a CPU is offlined, some arbitrary range of kernel memory
> seems to get corrupted. Then after a while, kernel panics at random places
> when block IOs are issued. (for example, see the call traces below)

Thanks for the report.

>
> This bug can be easily reproducible with a Qemu VM running with virtio-scsi,
> when its guest kernel is 3.19-rc1 or higher, and when scsi-mq is loaded
> with blk-mq enabled. And yes, 4.0 release is still affected, as well as
> Jens' for-4.1/core. How to reproduce:
>
>  # echo 0 > /sys/devices/system/cpu/cpu1/online
>  (and issue some block IOs, that's it.)
>
> Bisecting between 3.18 and 3.19-rc1, it looks like this bug had been hidden
> until commit ccbedf117f01 ("virtio_scsi: support multi hw queue of blk-mq"),
> which started to allow virtio-scsi to map virtqueues to hardware queues of
> blk-mq. Reverting that commit makes the bug go away. However, I suppose
> reverting it could not be a correct solution.

I agree, and that patch only enables multiple hw queues.

>
> More precisely, every time a CPU hotplug event gets triggered,
> a call graph is like the following:
>
>   blk_mq_queue_reinit_notify()
>   -> blk_mq_queue_reinit()
>    -> blk_mq_map_swqueue()
>     -> blk_mq_free_rq_map()
>      -> scsi_exit_request()
>
> From that point, as soon as any address in the request gets modified, an
> arbitrary range of memory gets corrupted. My first guess was that probably
> the exit routine could try to deallocate tags->rqs[] where invalid
> addresses are stored. But actually it looks like it's not the case,
> and cmd->sense_buffer looks also valid.
> It's not obvious to me, exactly what could go wrong.
>
> Does anyone have an idea?

As far as I can see, at least two problems exist:
- race between timeout and CPU hotplug
- in case of shared tags, during CPU online handling, about setting
and checking hctx->tags

So could you please test the attached two patches to see if they fix your issue?

I run them in my VM, and looks opps does disappear.


Thanks,
Ming Lei
>
> Regards,
> Dongsu
>
> ---- [beginning of call traces] ----
> [   47.274292] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> [   47.275013] IP: [<ffffffff8140b31d>] __bt_get.isra.5+0x7d/0x1e0
> [   47.275013] PGD 79c55067 PUD 7ba17067 PMD 0
> [   47.275013] Oops: 0000 [#1] SMP
> [   47.275013] Modules linked in: fuse cpufreq_stats binfmt_misc 9p fscache dm_round_robin loop dm_multipath 9pnet_virtio rtc_cmos 9pnet acpi_cpufreq serio_raw i2c_piix4 virtio_net
> [   47.275013] CPU: 3 PID: 6232 Comm: blkid Not tainted 4.0.0 #303
> [   47.275013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> [   47.275013] task: ffff88003dfbc020 ti: ffff880079bac000 task.ti: ffff880079bac000
> [   47.275013] RIP: 0010:[<ffffffff8140b31d>]  [<ffffffff8140b31d>] __bt_get.isra.5+0x7d/0x1e0
> [   47.275013] RSP: 0018:ffff880079baf898  EFLAGS: 00010246
> [   47.275013] RAX: 000000000000003c RBX: ffff880079198400 RCX: 0000000000000078
> [   47.275013] RDX: ffff88007fddbb80 RSI: 0000000000000010 RDI: ffff880079198400
> [   47.275013] RBP: ffff880079baf8e8 R08: ffff88007fddbb80 R09: 0000000000000000
> [   47.275013] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000010
> [   47.275013] R13: 0000000000000010 R14: ffff880079baf9e8 R15: ffff88007fddbb80
> [   47.275013] FS:  00002b270c049800(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> [   47.275013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   47.275013] CR2: 0000000000000018 CR3: 000000007ca8d000 CR4: 00000000001407e0
> [   47.275013] Stack:
> [   47.275013]  ffff880079baf978 ffff88007fdd58c0 0000000000000078 ffffffff814071ff
> [   47.275013]  ffff880079baf8d8 ffff880079198400 0000000000000010 0000000000000010
> [   47.275013]  ffff880079baf9e8 ffff88007fddbb80 ffff880079baf968 ffffffff8140b4e5
> [   47.275013] Call Trace:
> [   47.275013]  [<ffffffff814071ff>] ? blk_mq_queue_enter+0x9f/0x2d0
> [   47.275013]  [<ffffffff8140b4e5>] bt_get+0x65/0x1e0
> [   47.275013]  [<ffffffff814071ff>] ? blk_mq_queue_enter+0x9f/0x2d0
> [   47.275013]  [<ffffffff810c9b40>] ? wait_woken+0xa0/0xa0
> [   47.275013]  [<ffffffff8140ba07>] blk_mq_get_tag+0xa7/0xd0
> [   47.275013]  [<ffffffff8140630b>] __blk_mq_alloc_request+0x1b/0x200
> [   47.275013]  [<ffffffff81408736>] blk_mq_map_request+0xd6/0x4e0
> [   47.275013]  [<ffffffff8140a53e>] blk_mq_make_request+0x6e/0x2d0
> [   47.275013]  [<ffffffff813fb844>] ? generic_make_request_checks+0x674/0x6a0
> [   47.275013]  [<ffffffff813f23ae>] ? bio_add_page+0x5e/0x70
> [   47.275013]  [<ffffffff813fb930>] generic_make_request+0xc0/0x110
> [   47.275013]  [<ffffffff813fb9e8>] submit_bio+0x68/0x150
> [   47.275013]  [<ffffffff811b0c6c>] ? lru_cache_add+0x1c/0x50
> [   47.275013]  [<ffffffff8125972a>] mpage_bio_submit+0x2a/0x40
> [   47.275013]  [<ffffffff8125a81c>] mpage_readpages+0x10c/0x130
> [   47.275013]  [<ffffffff81254040>] ? I_BDEV+0x10/0x10
> [   47.275013]  [<ffffffff81254040>] ? I_BDEV+0x10/0x10
> [   47.275013]  [<ffffffff8119e417>] ? __page_cache_alloc+0x137/0x160
> [   47.275013]  [<ffffffff8125486d>] blkdev_readpages+0x1d/0x20
> [   47.275013]  [<ffffffff811ae43f>] __do_page_cache_readahead+0x29f/0x320
> [   47.275013]  [<ffffffff811ae305>] ? __do_page_cache_readahead+0x165/0x320
> [   47.275013]  [<ffffffff811aea14>] force_page_cache_readahead+0x34/0x60
> [   47.275013]  [<ffffffff811aea86>] page_cache_sync_readahead+0x46/0x50
> [   47.275013]  [<ffffffff811a094c>] generic_file_read_iter+0x52c/0x640
> [   47.275013]  [<ffffffff81254c17>] blkdev_read_iter+0x37/0x40
> [   47.275013]  [<ffffffff81211a0e>] new_sync_read+0x7e/0xb0
> [   47.275013]  [<ffffffff81212c88>] __vfs_read+0x18/0x50
> [   47.275013]  [<ffffffff81212d4d>] vfs_read+0x8d/0x150
> [   47.275013]  [<ffffffff81212e59>] SyS_read+0x49/0xb0
> [   47.275013]  [<ffffffff817063b2>] system_call_fastpath+0x12/0x17
> [   47.275013] Code: 97 18 03 00 00 bf 04 00 00 00 41 f7 f1 83 f8 04 0f 43 f8 b8 ff ff ff ff 44 39 d7 0f 86 c1 00 00 00 41 8b 00 48 89 4d c0 49 89 f5 <8b> 4e 08 8b 56 0c 4c 89 45 b0 c7 45 c8 00 00 00 00 41 89 c4 89
> [   47.275013] RIP  [<ffffffff8140b31d>] __bt_get.isra.5+0x7d/0x1e0
> [   47.275013]  RSP <ffff880079baf898>
> [   47.275013] CR2: 0000000000000018
> [   47.275013] ---[ end trace 9a650b674f0fae74 ]---
> [   47.701261] note: kworker/3:2[225] exited with preempt_count 1
> [   47.815398] BUG: unable to handle kernel paging request at ffffffffffffff98
> [   47.816324] IP: [<ffffffff810a00d0>] kthread_data+0x10/0x20
> [   47.816324] PGD 1c0c067 PUD 1c0e067 PMD 0
> [   47.816324] Oops: 0000 [#3] SMP
> [   47.816324] Modules linked in: fuse cpufreq_stats binfmt_misc 9p fscache dm_round_robin loop dm_multipath 9pnet_virtio rtc_cmos 9pnet acpi_cpufreq serio_raw i2c_piix4 virtio_net
> [   47.816324] CPU: 3 PID: 225 Comm: kworker/3:2 Tainted: G      D W       4.0.0 #303
> [   47.816324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> [   47.816324] task: ffff88007ac90000 ti: ffff88007906c000 task.ti: ffff88007906c000
> [   47.816324] RIP: 0010:[<ffffffff810a00d0>]  [<ffffffff810a00d0>] kthread_data+0x10/0x20
> [   47.816324] RSP: 0018:ffff88007906f5e8  EFLAGS: 00010092
> [   47.816324] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000000f
> [   47.816324] RDX: 000000000000000f RSI: 0000000000000003 RDI: ffff88007ac90000
> [   47.816324] RBP: ffff88007906f5e8 R08: ffff88007ac90090 R09: 0000000000000000
> [   47.816324] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88007fdd4dc0
> [   47.816324] R13: ffff88007ac90000 R14: 0000000000000003 R15: 0000000000000000
> [   47.816324] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> [   47.816324] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   47.816324] CR2: 0000000000000028 CR3: 0000000001c0b000 CR4: 00000000001407e0
> [   47.816324] Stack:
> [   47.816324]  ffff88007906f608 ffffffff81099f35 ffff88007906f608 00000000001d4dc0
> [   47.816324]  ffff88007906f678 ffffffff816ff757 ffffffff8107cfc6 0000000000000292
> [   47.816324]  ffffffff8107cffa ffff88007ac90000 ffff88007906f678 0000000000000296
> [   47.816324] Call Trace:
> [   47.816324]  [<ffffffff81099f35>] wq_worker_sleeping+0x15/0xa0
> [   47.816324]  [<ffffffff816ff757>] __schedule+0xa77/0x1080
> [   47.816324]  [<ffffffff8107cfc6>] ? do_exit+0x756/0xbf0
> [   47.816324]  [<ffffffff8107cffa>] ? do_exit+0x78a/0xbf0
> [   47.816324]  [<ffffffff816ffd97>] schedule+0x37/0x90
> [   47.816324]  [<ffffffff8107d0d6>] do_exit+0x866/0xbf0
> [   47.816324]  [<ffffffff810ec14e>] ? kmsg_dump+0xfe/0x200
> [   47.816324]  [<ffffffff810068ad>] oops_end+0x8d/0xd0
> [   47.816324]  [<ffffffff81047849>] no_context+0x119/0x370
> [   47.816324]  [<ffffffff810ce795>] ? cpuacct_charge+0x5/0x1c0
> [   47.816324]  [<ffffffff810b4a25>] ? sched_clock_local+0x25/0x90
> [   47.816324]  [<ffffffff81047b25>] __bad_area_nosemaphore+0x85/0x210
> [   47.816324]  [<ffffffff81047cc3>] bad_area_nosemaphore+0x13/0x20
> [   47.816324]  [<ffffffff81047fb6>] __do_page_fault+0xb6/0x490
> [   47.816324]  [<ffffffff8104839c>] do_page_fault+0xc/0x10
> [   47.816324]  [<ffffffff817080c2>] page_fault+0x22/0x30
> [   47.816324]  [<ffffffff8140b31d>] ? __bt_get.isra.5+0x7d/0x1e0
> [   47.816324]  [<ffffffff8140b4e5>] bt_get+0x65/0x1e0
> [   47.816324]  [<ffffffff810c9b40>] ? wait_woken+0xa0/0xa0
> [   47.816324]  [<ffffffff8140ba07>] blk_mq_get_tag+0xa7/0xd0
> [   47.816324]  [<ffffffff8140630b>] __blk_mq_alloc_request+0x1b/0x200
> [   47.816324]  [<ffffffff81407f91>] blk_mq_alloc_request+0xa1/0x250
> [   47.816324]  [<ffffffff813fc74c>] blk_get_request+0x2c/0xf0
> [   47.816324]  [<ffffffff810a6acd>] ? __might_sleep+0x4d/0x90
> [   47.816324]  [<ffffffff815747dd>] scsi_execute+0x3d/0x1f0
> [   47.816324]  [<ffffffff815763be>] scsi_execute_req_flags+0x8e/0x100
> [   47.816324]  [<ffffffff81576a43>] scsi_test_unit_ready+0x83/0x130
> [   47.816324]  [<ffffffff8158672e>] sd_check_events+0x14e/0x1b0
> [   47.816324]  [<ffffffff8140e731>] disk_check_events+0x51/0x170
> [   47.816324]  [<ffffffff8140e86c>] disk_events_workfn+0x1c/0x20
> [   47.816324]  [<ffffffff81099128>] process_one_work+0x1e8/0x800
> [   47.816324]  [<ffffffff8109909d>] ? process_one_work+0x15d/0x800
> [   47.816324]  [<ffffffff8109981a>] ? worker_thread+0xda/0x470
> [   47.816324]  [<ffffffff81099793>] worker_thread+0x53/0x470
> [   47.816324]  [<ffffffff81099740>] ? process_one_work+0x800/0x800
> [   47.816324]  [<ffffffff81099740>] ? process_one_work+0x800/0x800
> [   47.816324]  [<ffffffff8109f652>] kthread+0xf2/0x110
> [   47.816324]  [<ffffffff810d3d4d>] ? trace_hardirqs_on+0xd/0x10
> [   47.816324]  [<ffffffff8109f560>] ? kthread_create_on_node+0x230/0x230
> [   47.816324]  [<ffffffff81706308>] ret_from_fork+0x58/0x90
> [   47.816324]  [<ffffffff8109f560>] ? kthread_create_on_node+0x230/0x230
> [   47.816324] Code: 00 48 89 e5 5d 48 8b 40 88 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 20 04 00 00 55 48 89 e5 <48> 8b 40 98 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> [   47.816324] RIP  [<ffffffff810a00d0>] kthread_data+0x10/0x20
> [   47.816324]  RSP <ffff88007906f5e8>
> [   47.816324] CR2: ffffffffffffff98
> [   47.816324] ---[ end trace 9a650b674f0fae76 ]---
> [   47.816324] Fixing recursive fault but reboot is needed!
> ---- [end of call traces] ----
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

View attachment "0001-block-blk-mq-fix-race-between-timeout-and-CPU-hotplu.patch" of type "text/x-patch" (1821 bytes)

View attachment "0002-blk-mq-fix-CPU-hotplug-handling.patch" of type "text/x-patch" (1824 bytes)