[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50DD973B.8000101@iskon.hr>
Date: Fri, 28 Dec 2012 13:57:31 +0100
From: Zlatko Calusic <zlatko.calusic@...on.hr>
To: Zhouping Liu <zliu@...hat.com>
CC: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>,
Johannes Weiner <jweiner@...hat.com>, mgorman@...e.de,
hughd@...gle.com, Andrea Arcangeli <aarcange@...hat.com>,
Hillf Danton <dhillf@...il.com>, sedat.dilek@...il.com
Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000500
On 28.12.2012 03:45, Zhouping Liu wrote:
>>
>> Thank you for the report Zhouping!
>>
>> Would you be so kind to test the following patch and report results?
>> Apply the patch to the latest mainline.
>
> Hello Zlatko,
>
> I have tested the below patch(applied it on mainline directly),
> but IMO, I'd like to say it maybe don't fix the issue completely.
>
> run the reproducer[1] on two machine, one machine has 2 numa nodes(8Gb RAM),
> another one has 4 numa nodes(8Gb RAM), then the system hung all the time, such as the dmesg log:
>
> [ 713.066937] Killed process 6085 (oom01) total-vm:18880768kB, anon-rss:7915612kB, file-rss:4kB
> [ 959.555269] INFO: task kworker/13:2:147 blocked for more than 120 seconds.
> [ 959.562144] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1079.382018] INFO: task kworker/13:2:147 blocked for more than 120 seconds.
> [ 1079.388872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1199.209709] INFO: task kworker/13:2:147 blocked for more than 120 seconds.
> [ 1199.216562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1319.036939] INFO: task kworker/13:2:147 blocked for more than 120 seconds.
> [ 1319.043794] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1438.864797] INFO: task kworker/13:2:147 blocked for more than 120 seconds.
> [ 1438.871649] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1558.691611] INFO: task kworker/13:2:147 blocked for more than 120 seconds.
> [ 1558.698466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ......
>
> I'm not sure whether it's your patch triggering the hung task or not, but reverted cda73a10eb3,
> the reproducer(oom01) can PASS without both 'NULL pointer dereference at 0000000000000500' and hung task issues.
>
> but some time, it's possible that the reproducer(oom01) cause hung task on a box with large RAM(100Gb+), so I can't judge it...
>
Thanks for the test.
Yes, close to OOM things get quite unstable and it's hard to get
reliable test results. Maybe you could run it a few times, and see if
you can get any meaningful statistics out of a few runs. I need to check
oom.c myself and see what it's doing. Thanks for the link.
Regards,
--
Zlatko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists