linux-kernel - Re: [BUG REPORT] ktime_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAO6TR8VZF5NRPFTYJBNEyYCCrrGrytGNY0otSGfGzLm+_dYbJg@mail.gmail.com>
Date:	Wed, 20 Jan 2016 10:36:02 -0700
From:	Jeff Merkey <linux.mdb@...il.com>
To:	John Stultz <john.stultz@...aro.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup

On 1/20/16, John Stultz <john.stultz@...aro.org> wrote:
> On Wed, Jan 20, 2016 at 9:16 AM, Jeff Merkey <linux.mdb@...il.com> wrote:
>> On 1/20/16, Jeff Merkey <linux.mdb@...il.com> wrote:
>>> On 1/20/16, Thomas Gleixner <tglx@...utronix.de> wrote:
>>>> On Tue, 19 Jan 2016, Jeff Merkey wrote:
>>>>> Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
>>>>> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
>>>>
>>>> And how exactly does that happen?
>>>>
>>>> 0x17AE7F57C671EA7D = 1.70644e+18  nsec
>>>>                 = 1.70644e+09  sec
>>>>                 = 2.84407e+07  min
>>>>                 = 474011       hrs
>>>>                 = 19750.5      days
>>>>                 = 54.1109      years
>>>>
>>>> That's the real issue, not what you are trying to 'fix' in
>>>> timespec_add_ns()
>>>>
>>
>> I guess I am going to have to become an expert on the timekeeper and
>> learn this subsystem backwards and forwards to code a touch function
>> to keep it from crashing the system.
>>
>> On the 2.6 series kernels (and 2.2) this problem did not exist.  I
>> noticed a lot of these changes came in in the late 2.6 cycles.  Before
>> that time, I could leave the debugger spinning for days and linux
>> worked fine.
>>
>> For people who have to pay developers to develop code on Linux a
>> debugger is almost
>> an essential tool since it saves hundreds of thousands of dollars in
>> development costs.  Not everyone wants to spend money for their
>> employees and engineers to sit around and code review every problem -
>> customers just want their problems fixed -- and fast.  That being
>> said, I am having no lack of people who download and use this debugger
>> and I'm certain kgdb is heavily used by folks doing development.   If
>> kernel development is too hard, people move to something else based on
>> simple economics.
>>
>> That being said, I need to get this fixed.  There is no good reason a
>> debugger shouldn't be able to stop the system and leave it suspended
>> for days if necessary to run down a bug.  I wrote a debugger on SMP
>> Netware that worked that way.  The earliest versions of MDB worked
>> that way.
>>
>> kgdb is broken right now because of this.  I am not certain it affects
>> all systems out there, but it needs to be fixed.
>>
>> If you have any ideas on how to code a touch function please send me a
>> patch or suggest how it could be done non-obstrusively, otherwise I'll
>> have to dive into the timekeeper and fix it myself and learn yet
>> another subsystem of Linux and fix it bugs.  A code subsystem that
>> crashes because the timer tick is skewed or returns garbage is poorly
>> designed IMHO.
>
> Ehrm.  A more productive route in solving this might be to cap the
> cycle delta we return from timekeeping_get_delta().
>
> We already do this in the CONFIG_DEBUG_TIMEKEEPING, but adding a
> simple check it to the non-debug case should be doable w/o adding too
> much overhead to this very hot path.
>
> Something like:
> if (delta > tkr->clock->max_cycles)
>     delta = tkr->clock->max_cycles;
>
> return delta;
>
> thanks
> -john
>


Thank you John.  This is helpful.  Can you send me a patch for this
and I'll test it.  Then I am not touching this code and you guys can
put it in.

Jeff