[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <571E8F8D.9040701@linaro.org>
Date: Mon, 25 Apr 2016 14:43:41 -0700
From: "Shi, Yang" <yang.shi@...aro.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: tj@...nel.org, mingo@...hat.com, lizefan@...wei.com,
linux-kernel@...r.kernel.org, linux-next@...r.kernel.org,
linaro-kernel@...ts.linaro.org
Subject: Re: [linux-next PATCH] sched: cgroup: enable interrupt before calling
threadgroup_change_begin
On 4/25/2016 10:35 AM, Shi, Yang wrote:
> On 4/23/2016 2:14 AM, Peter Zijlstra wrote:
>> On Fri, Apr 22, 2016 at 08:56:28PM -0700, Yang Shi wrote:
>>> When kernel oops happens in some kernel thread, i.e. kcompactd in the
>>> test,
>>> the below bug might be triggered by the oops handler:
>>
>> What are you trying to fix? You already oopsed the thing is wrecked.
>
> Actually, I ran into the below kernel BUG:
>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffff8119d2f8>] release_freepages+0x18/0xa0
> PGD 0
> Oops: 0000 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 6 PID: 110 Comm: kcompactd0 Not tainted 4.6.0-rc4-next-20160420 #4
> Hardware name: Intel Corporation S5520HC/S5520HC, BIOS
> S5500.86B.01.10.0025.030220091519 03/02/2009
> task: ffff880361732680 ti: ffff88036173c000 task.ti: ffff88036173c000
> RIP: 0010:[<ffffffff8119d2f8>] [<ffffffff8119d2f8>]
> release_freepages+0x18/0xa0
> RSP: 0018:ffff88036173fcf8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff88036ffde7c0 RCX: 0000000000000009
> RDX: 0000000000001bf1 RSI: 000000000000000f RDI: ffff88036173fdd0
> RBP: ffff88036173fd20 R08: 0000000000000007 R09: 0000160000000000
> R10: ffff88036ffde7c0 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff88036173fdd0 R14: ffff88036173fdc0 R15: ffff88036173fdb0
> FS: 0000000000000000(0000) GS:ffff880363cc0000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000000002206000 CR4: 00000000000006e0
> Stack:
> ffff88036ffde7c0 0000000000000000 0000000000001a00 ffff88036173fdc0
> ffff88036173fdb0 ffff88036173fda0 ffffffff8119f13d ffffffff81196239
> 0000000000000000 ffff880361732680 0000000000000001 0000000000100000
> Call Trace:
> [<ffffffff8119f13d>] compact_zone+0x55d/0x9f0
> [<ffffffff81196239>] ? fragmentation_index+0x19/0x70
> [<ffffffff8119f92f>] kcompactd_do_work+0x10f/0x230
> [<ffffffff8119fae0>] kcompactd+0x90/0x1e0
> [<ffffffff810a3a40>] ? wait_woken+0xa0/0xa0
> [<ffffffff8119fa50>] ? kcompactd_do_work+0x230/0x230
> [<ffffffff810801ed>] kthread+0xdd/0x100
> [<ffffffff81be5ee2>] ret_from_fork+0x22/0x40
> [<ffffffff81080110>] ? kthread_create_on_node+0x180/0x180
> Code: c1 fa 06 31 f6 e8 a9 9b fd ff eb 98 0f 1f 80 00 00 00 00 66 66 66
> 66 90 55 48 89 e5 41 57 41 56 41 55 49 89 fd 41 54 53 48 8b 07 <48> 8b
> 10 48 8d 78 e0 49 39 c5 4c 8d 62 e0 74 70 49 be 00 00 00
> RIP [<ffffffff8119d2f8>] release_freepages+0x18/0xa0
> RSP <ffff88036173fcf8>
> CR2: 0000000000000000
> ---[ end trace 2e96d09e0ba6342f ]---
>
> Then the "schedule in atomic context" bug is triggered which cause the
> system hang. But, the system is still alive without the "schedule in
> atomic context" bug. The previous null pointer deference issue doesn't
> bring the system down other than killing the compactd kthread.
BTW, I don't have "panic on oops" set. So, the kernel doesn't panic.
Thanks,
Yang
>
> Thanks,
> Yang
>
>>
>
Powered by blists - more mailing lists