[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALAqxLXMUcucGd7xQ8JD120LdiOr9qbb-RybwJYx95c=V+9p7w@mail.gmail.com>
Date: Wed, 20 Jan 2016 09:32:36 -0800
From: John Stultz <john.stultz@...aro.org>
To: Jeff Merkey <linux.mdb@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
On Wed, Jan 20, 2016 at 9:16 AM, Jeff Merkey <linux.mdb@...il.com> wrote:
> On 1/20/16, Jeff Merkey <linux.mdb@...il.com> wrote:
>> On 1/20/16, Thomas Gleixner <tglx@...utronix.de> wrote:
>>> On Tue, 19 Jan 2016, Jeff Merkey wrote:
>>>> Nasty bug but trivial fix for this. What happens here is RAX (nsecs)
>>>> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
>>>
>>> And how exactly does that happen?
>>>
>>> 0x17AE7F57C671EA7D = 1.70644e+18 nsec
>>> = 1.70644e+09 sec
>>> = 2.84407e+07 min
>>> = 474011 hrs
>>> = 19750.5 days
>>> = 54.1109 years
>>>
>>> That's the real issue, not what you are trying to 'fix' in
>>> timespec_add_ns()
>>>
>
> I guess I am going to have to become an expert on the timekeeper and
> learn this subsystem backwards and forwards to code a touch function
> to keep it from crashing the system.
>
> On the 2.6 series kernels (and 2.2) this problem did not exist. I
> noticed a lot of these changes came in in the late 2.6 cycles. Before
> that time, I could leave the debugger spinning for days and linux
> worked fine.
>
> For people who have to pay developers to develop code on Linux a
> debugger is almost
> an essential tool since it saves hundreds of thousands of dollars in
> development costs. Not everyone wants to spend money for their
> employees and engineers to sit around and code review every problem -
> customers just want their problems fixed -- and fast. That being
> said, I am having no lack of people who download and use this debugger
> and I'm certain kgdb is heavily used by folks doing development. If
> kernel development is too hard, people move to something else based on
> simple economics.
>
> That being said, I need to get this fixed. There is no good reason a
> debugger shouldn't be able to stop the system and leave it suspended
> for days if necessary to run down a bug. I wrote a debugger on SMP
> Netware that worked that way. The earliest versions of MDB worked
> that way.
>
> kgdb is broken right now because of this. I am not certain it affects
> all systems out there, but it needs to be fixed.
>
> If you have any ideas on how to code a touch function please send me a
> patch or suggest how it could be done non-obstrusively, otherwise I'll
> have to dive into the timekeeper and fix it myself and learn yet
> another subsystem of Linux and fix it bugs. A code subsystem that
> crashes because the timer tick is skewed or returns garbage is poorly
> designed IMHO.
Ehrm. A more productive route in solving this might be to cap the
cycle delta we return from timekeeping_get_delta().
We already do this in the CONFIG_DEBUG_TIMEKEEPING, but adding a
simple check it to the non-debug case should be doable w/o adding too
much overhead to this very hot path.
Something like:
if (delta > tkr->clock->max_cycles)
delta = tkr->clock->max_cycles;
return delta;
thanks
-john
Powered by blists - more mailing lists