linux-kernel - Re: NULL pointer dereference at blk_drain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 15 Jun 2012 11:06:52 +0800
From:	Asias He <asias@...hat.com>
To:	Jens Axboe <axboe@...nel.dk>
CC:	Jiri Slaby <jslaby@...e.cz>, Tejun Heo <tj@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jiri Slaby <jirislaby@...il.com>
Subject: Re: NULL pointer dereference at blk_drain_queue

On 06/14/2012 09:25 PM, Jens Axboe wrote:
> On 06/14/2012 03:20 PM, Asias He wrote:
>> On 06/14/2012 05:42 PM, Jiri Slaby wrote:
>>> On 06/14/2012 11:16 AM, Jens Axboe wrote:
>>>> On 06/14/2012 11:04 AM, Jiri Slaby wrote:
>>>>> Hi,
>>>>>
>>>>> with today's -next I'm (reproducibly) getting this while updating packages:
>>>>> BUG: unable to handle kernel NULL pointer dereference at           (null)
>>>>> IP: [<ffffffff8108cd16>] __wake_up_common+0x26/0x90
>>>>> PGD 463f1067 PUD 463f2067 PMD 0
>>>>> Oops: 0000 [#1] SMP
>>>>> CPU 1
>>>>> Modules linked in:
>>>>> Pid: 2711, comm: kworker/1:0 Not tainted 3.5.0-rc2-next-20120614_64+
>>>>> #1752 Bochs Bochs
>>>>> RIP: 0010:[<ffffffff8108cd16>]  [<ffffffff8108cd16>]
>>>>> __wake_up_common+0x26/0x90
>>>>> RSP: 0018:ffff880047221cb0  EFLAGS: 00010082
>>>>> RAX: 0000000000000086 RBX: ffff880046350888 RCX: 0000000000000000
>>>>> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff880046350888
>>>>> RBP: ffff880047221cf0 R08: 0000000000000000 R09: 00000001000c0009
>>>>> R10: ffff880047804480 R11: 0000000000000000 R12: ffff880046350890
>>>>> R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
>>>>> FS:  0000000000000000(0000) GS:ffff880049700000(0000) knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>> CR2: 0000000000000000 CR3: 0000000045ced000 CR4: 00000000000006e0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>> Process kworker/1:0 (pid: 2711, threadinfo ffff880047220000, task
>>>>> ffff8800435bc5c0)
>>>>> Stack:
>>>>>    000000004628da68 0000000000000000 ffff88004970d340 ffff880046350888
>>>>>    0000000000000086 0000000000000003 0000000000000000 0000000000000000
>>>>>    ffff880047221d30 ffffffff8108d9a3 ffff88004970d340 ffff880046350848
>>>>> Call Trace:
>>>>>    [<ffffffff8108d9a3>] __wake_up+0x43/0x70
>>>>>    [<ffffffff81267f96>] blk_drain_queue+0xf6/0x120
>>>>>    [<ffffffff8126803f>] blk_cleanup_queue+0x7f/0xd0
>>>>>    [<ffffffff814a9a80>] md_free+0x50/0x70
>>>>>    [<ffffffff8127b3c2>] kobject_cleanup+0x82/0x1b0
>>>>>    [<ffffffff8127b24b>] kobject_put+0x2b/0x60
>>>>>    [<ffffffff814a97ef>] mddev_delayed_delete+0x2f/0x40
>>>>>    [<ffffffff8107e1ab>] process_one_work+0x11b/0x3f0
>>>>>    [<ffffffff814a97c0>] ? restart_array+0xc0/0xc0
>>>>>    [<ffffffff8107f94e>] worker_thread+0x12e/0x340
>>>>>    [<ffffffff8107f820>] ? manage_workers.isra.29+0x1f0/0x1f0
>>>>>    [<ffffffff81084e1e>] kthread+0x8e/0xa0
>>>>>    [<ffffffff8160add4>] kernel_thread_helper+0x4/0x10
>>>>>    [<ffffffff81084d90>] ? flush_kthread_worker+0x70/0x70
>>>>>    [<ffffffff8160add0>] ? gs_change+0xb/0xb
>>>>> Code: 80 00 00 00 00 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41
>>>>> 54 4c 8d 67 08 53 48 83 ec 18 89 55 c4 48 8b 57 08 4c 89 45 c8 <4c> 8b
>>>>> 2a 48 8d 42 e8 49 83 ed 18 49 39 d4 75 0d eb 40 0f 1f 84
>>>>
>>>> It's a bug in local commit bc85cf83, for stacked devices we have not
>>>> initialized the wait queues. So the below should fix it, as would always
>>>> initializing all queue structures even for the partial use case.
>>>>
>>>>
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index b477fa0..93eb3e4 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -415,10 +415,12 @@ void blk_drain_queue(struct request_queue *q, bool drain_all)
>>>>         * allocation path, so the wakeup chaining is lost and we're
>>>>         * left with hung waiters. We need to wake up those waiters.
>>>>         */
>>>> -    spin_lock_irq(q->queue_lock);
>>>> -    for (i = 0; i < ARRAY_SIZE(q->rq.wait); i++)
>>>> -        wake_up_all(&q->rq.wait[i]);
>>>> -    spin_unlock_irq(q->queue_lock);
>>>> +    if (q->request_fn) {
>>>> +        spin_lock_irq(q->queue_lock);
>>>> +        for (i = 0; i < ARRAY_SIZE(q->rq.wait); i++)
>>>> +            wake_up_all(&q->rq.wait[i]);
>>>> +        spin_unlock_irq(q->queue_lock);
>>>> +    }
>>>
>>> Yes, that fixed it.
>>
>> Jiri, good to hear this fixes for you. BTW. How do you trigger this
>> issue?
>>
>> Jens, do you prefer to fix it up in your tree yourself or wait a patch
>> from me?
>
> I will fixup the existing patch, so we don't have this problem in a
> bisection point after it's merged.

OK. Thanks.

-- 
Asias


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/