linux-kernel - Re: [BUG REPORT] ktime_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO6TR8UuS-hG8-4b7QPx=BiQ-RBvcRN+UuPM457gR1gA7aKu2Q@mail.gmail.com>
Date:	Thu, 21 Jan 2016 09:57:06 -0700
From:	Jeff Merkey <linux.mdb@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	John Stultz <john.stultz@...aro.org>
Subject: Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup

One other item worth mentioning is the effect of this bug on the user
space daemons.  Since this bug is in all kernels from 3.6-v4.4, folks
will see the following behavior of systemd when this bug fires off and
the value wraps:

id 0a0c809bcdc0812145d9c35456e46335650601b7
reason:         systemd-journald killed by SIGABRT
time:           Sun 10 Jan 2016 11:56:16 AM MST
cmdline:        /usr/lib/systemd/systemd-journald
package:        systemd-219-19.el7
uid:            0 (root)
count:          2
Directory:      /var/tmp/abrt/ccpp-2016-01-10-11:56:16-6281

id 38f7bb372d0c14cb912e9d225fb474b931889e77
reason:         __epoll_wait_nocancel(): systemd-logind killed by SIGABRT
time:           Fri 08 Jan 2016 01:33:42 PM MST
cmdline:        /usr/lib/systemd/systemd-logind
package:        systemd-219-19.el7
uid:            0 (root)
count:          39
Directory:      /var/tmp/abrt/ccpp-2016-01-08-13:33:42-630
Reported:
https://retrace.fedoraproject.org/faf/reports/bthash/307b26a77cc6d5005ce2fdf18ff010fe3dc94401

So if folks see aborts in systemd that look like the above, it's this
bug.    I traced the systemd crash as well and its caused by the
system returning garbage via system calls to systemd when that rax
value wraps.

Jeff.



On 1/21/16, Jeff Merkey <linux.mdb@...il.com> wrote:
> Hi Thomas
>
> That patch works and so does the patch currently in linux-next.  Yeah !
>
> The code on your machine is apparently from the linux-next tree. In
> 4.4 is busted and that was the build I was using since that's the last
> build I have MDB released on.  If you compare v4.4 vs. linux-next you
> will see someone already has patched this problem -- its just not in
> Linus tree yet.
>
> v4.4
>
> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
> {
> 	cycle_t delta;
> 	s64 nsec;
>
> 	delta = timekeeping_get_delta(tkr);
>
> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
> 	nsec >>= tkr->shift;  << causes sar to be used
>
> 	/* If arch requires, add in get_arch_timeoffset() */
> 	return nsec + arch_gettimeoffset();
> }
>
> linux-next
>
> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
> {
> 	cycle_t delta;
> 	u64 nsec;
>
> 	delta = timekeeping_get_delta(tkr);
>
> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift; << shr used
> here
>
> 	/* If arch requires, add in get_arch_timeoffset() */
> 	return (s64)(nsec + arch_gettimeoffset());
> }
>
>
> This bug goes way back in the linux revs, so I have to retro apply
> this patch back to v3.6 in the MDB patch series.  This bug was
> introduced in the late linux 3.6 kernels and exists in all of them up
> to v4.4.
>
> Jeff
>
>
> On 1/21/16, Thomas Gleixner <tglx@...utronix.de> wrote:
>> Jeff,
>>
>> On Thu, 21 Jan 2016, Jeff Merkey wrote:
>>> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>>> {
>>> 	cycle_t delta;
>>> 	s64 nsec;
>>>
>>> 	delta = timekeeping_get_delta(tkr);
>>>
>>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>> 	nsec >>= tkr->shift;    << wrap caused here
>>>
>>> 	/* If arch requires, add in get_arch_timeoffset() */
>>> 	return nsec + arch_gettimeoffset();
>>> }
>>>
>>> You only have 64 bits of register and the numbers being calculated
>>> here are big.   By way of example, I observed the following during
>>> normal operations:
>>>
>>> delta  (RAX)   |     tkr->mult (RDX)
>>>
>>> 0x157876            0x65ee27
>>> 0xf1855               0x65f158
>>> 0x16cf05             0x65f408
>>> 303bc3                0x65f154
>>>
>>> When this bug occurs different story.
>>>
>>> delta  (RAX)    |     tkr->mult (RDX)
>>>
>>> 0x243283994b8     0x65233
>>>
>>> So it goes like this:
>>>
>>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>>         0x243283994b8 * 0x65233
>>>         imul   rax,rdx = 0xE6A2Ce1f1ea690a8
>>>
>>> 	nsec >>= tkr->shift;    << wrap caused here
>>>         sar    rax,cl  =  0xFFFFFFE6BFB3B7C3
>>
>> That SAR is siomply wrong here. It must be an SHR and it is at least when
>> I'm
>> looking at the assembly of my machine.
>>
>>> the sar instruction doesn't just shift, it backfills the signedness of
>>> the value, so this instruction is not doing what the C code is asking
>>> it to do.  I am guessing that somewhere in this mass of macros,
>>> something may have gotten declared wrong or incomplete (declared
>>> signed ?).
>>
>> There is no macro involved.
>>
>> timekeeping_get_ns
>> {
>> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>> }
>>
>>> The assembler output for this section that calls the macro to
>>> calculate nsecs shows the sar instruction:
>>>
>>> 	delta = timekeeping_get_delta(tkr);
>>>
>>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>>      29b:	48 0f af c2          	imul   %rdx,%rax
>>>      29f:	48 03 05 00 00 00 00 	add    0x0(%rip),%rax        # 2a6
>>> <ktime_get_ts64+0xc6>
>>> 	nsec >>= tkr->shift;
>>>      2a6:	48 d3 f8             	sar    %cl,%rax
>>
>> And this is fundamentally wrong. Why is the compiler emitting SAR instead
>> of
>> SHR here? Here is the assembly output from my kernel:
>>
>> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>>      27e:	48 0f af c5          	imul   %rbp,%rax
>>      282:	48 01 d8             	add    %rbx,%rax
>>      285:	48 d3 e8             	shr    %cl,%rax
>>
>> 	} while (read_seqcount_retry(&tk_core.seq, seq));
>>
>>
>> So the first thing which needs to be figured out is WHY this results in a
>> SAR
>> on your compiler.
>>
>>> There is another problem with the tkr->read returning an unchanging,
>>> unclearable number when this bug occurs for the delta value.  I
>>> appears for whatever reason the clock has gone to sleep or gone away
>>> and is no longer updating its counters.
>>>
>>> static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
>>> {
>>> 	cycle_t cycle_now, delta;
>>>
>>> 	/* read clocksource */
>>> 	cycle_now = tkr->read(tkr->clock); << returns the same value after
>>> this bug happens
>>>
>>> 	/* calculate the delta since the last update_wall_time */
>>> 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); <<
>>> cycle last is also the same value.
>>>
>>> 	return delta;
>>> }
>>
>> If that value does not change, then the timekeeping update is not
>> running. That might happen because the timer interrupt is not happening
>> or
>> whatever got wreckaged.
>>
>>> I would check how these structs are defined and the vars in them to
>>> see if somewhere they are declared as signed values to the compiler,
>>> because that's what it thinks it was given to compile.
>>
>> Sure. Here you go:
>>
>> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>>
>> delta, mult, xtime_nsec and shift are unsigned. The only signed value is
>> nsec.
>>
>> Does that issue go away if you apply the patch below?
>>
>> Thanks,
>>
>> 	tglx
>>
>> 8<-----------
>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>> index 34b4cedfa80d..d405bcdf9d40 100644
>> --- a/kernel/time/timekeeping.c
>> +++ b/kernel/time/timekeeping.c
>> @@ -301,7 +301,7 @@ static inline u32 arch_gettimeoffset(void) { return
>> 0;
>> }
>>  static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>>  {
>>  	cycle_t delta;
>> -	s64 nsec;
>> +	u64 nsec;
>>
>>  	delta = timekeeping_get_delta(tkr);
>>
>>
>