[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20161220165940.GC17367@csclub.uwaterloo.ca>
Date: Tue, 20 Dec 2016 11:59:40 -0500
From: lsorense@...lub.uwaterloo.ca (Lennart Sorensen)
To: linux-kernel@...r.kernel.org
Subject: Re: Debug hints for fpu state NULL pointer dereference on context
switch during core dump in 3.0.101
On Mon, Dec 19, 2016 at 01:09:39PM -0500, Lennart Sorensen wrote:
> I am trying to debug a problem that has been happening occationally for
> years on some of our systems running 3.0.101 kernel (yes I know it is
> old, we are moving to 4.9 at the moment but I would like older releases
> to be fixed too, assuming 4.9 makes this problem disappear).
>
> What is happening is that once in a while a process does something wrong
> and segfaults, and dumps core. We have a handler to process the core dump
> to name it and compress it and make sure we don't keep to many around,
> so the core_pattern uses the pipe option to pipe the dump to a shell
> script that saves it with the pid and current timestamp and gzips it.
>
> Once in a while when this happens, the kernel hits a null pointer
> dereference in fpu.state->xsave while doing __switch_to.
>
> The system ix x86_64 with dual E5-2620 CPUs (6 cores each with
> hyperthreading). Some people think they have seen it on other systems,
> but are not sure. I have not been able to trigger it on other systems
> yet.
>
> It used to take about a week of running tests to trigger it, but I have
> now managed to hit it in a few minutes pretty reliably.
If the core_pattern is not set to use a pipe, but just save as core.%e.%p
then the problem does not happen.
--
Len Sorensen
Powered by blists - more mailing lists