lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 21 Jan 2016 08:46:46 -0700
From:	Jeff Merkey <linux.mdb@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	John Stultz <john.stultz@...aro.org>
Subject: Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup

Hi Thomas

That patch works and so does the patch currently in linux-next.  Yeah !

The code on your machine is apparently from the linux-next tree. In
4.4 is busted and that was the build I was using since that's the last
build I have MDB released on.  If you compare v4.4 vs. linux-next you
will see someone already has patched this problem -- its just not in
Linus tree yet.

v4.4

static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
{
	cycle_t delta;
	s64 nsec;

	delta = timekeeping_get_delta(tkr);

	nsec = delta * tkr->mult + tkr->xtime_nsec;
	nsec >>= tkr->shift;  << causes sar to be used

	/* If arch requires, add in get_arch_timeoffset() */
	return nsec + arch_gettimeoffset();
}

linux-next

static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
{
	cycle_t delta;
	u64 nsec;

	delta = timekeeping_get_delta(tkr);

	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift; << shr used here

	/* If arch requires, add in get_arch_timeoffset() */
	return (s64)(nsec + arch_gettimeoffset());
}


This bug goes way back in the linux revs, so I have to retro apply
this patch back to v3.6 in the MDB patch series.  This bug was
introduced in the late linux 3.6 kernels and exists in all of them up
to v4.4.

Jeff


On 1/21/16, Thomas Gleixner <tglx@...utronix.de> wrote:
> Jeff,
>
> On Thu, 21 Jan 2016, Jeff Merkey wrote:
>> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>> {
>> 	cycle_t delta;
>> 	s64 nsec;
>>
>> 	delta = timekeeping_get_delta(tkr);
>>
>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>> 	nsec >>= tkr->shift;    << wrap caused here
>>
>> 	/* If arch requires, add in get_arch_timeoffset() */
>> 	return nsec + arch_gettimeoffset();
>> }
>>
>> You only have 64 bits of register and the numbers being calculated
>> here are big.   By way of example, I observed the following during
>> normal operations:
>>
>> delta  (RAX)   |     tkr->mult (RDX)
>>
>> 0x157876            0x65ee27
>> 0xf1855               0x65f158
>> 0x16cf05             0x65f408
>> 303bc3                0x65f154
>>
>> When this bug occurs different story.
>>
>> delta  (RAX)    |     tkr->mult (RDX)
>>
>> 0x243283994b8     0x65233
>>
>> So it goes like this:
>>
>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>         0x243283994b8 * 0x65233
>>         imul   rax,rdx = 0xE6A2Ce1f1ea690a8
>>
>> 	nsec >>= tkr->shift;    << wrap caused here
>>         sar    rax,cl  =  0xFFFFFFE6BFB3B7C3
>
> That SAR is siomply wrong here. It must be an SHR and it is at least when
> I'm
> looking at the assembly of my machine.
>
>> the sar instruction doesn't just shift, it backfills the signedness of
>> the value, so this instruction is not doing what the C code is asking
>> it to do.  I am guessing that somewhere in this mass of macros,
>> something may have gotten declared wrong or incomplete (declared
>> signed ?).
>
> There is no macro involved.
>
> timekeeping_get_ns
> {
> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
> }
>
>> The assembler output for this section that calls the macro to
>> calculate nsecs shows the sar instruction:
>>
>> 	delta = timekeeping_get_delta(tkr);
>>
>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>      29b:	48 0f af c2          	imul   %rdx,%rax
>>      29f:	48 03 05 00 00 00 00 	add    0x0(%rip),%rax        # 2a6
>> <ktime_get_ts64+0xc6>
>> 	nsec >>= tkr->shift;
>>      2a6:	48 d3 f8             	sar    %cl,%rax
>
> And this is fundamentally wrong. Why is the compiler emitting SAR instead
> of
> SHR here? Here is the assembly output from my kernel:
>
> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>      27e:	48 0f af c5          	imul   %rbp,%rax
>      282:	48 01 d8             	add    %rbx,%rax
>      285:	48 d3 e8             	shr    %cl,%rax
>
> 	} while (read_seqcount_retry(&tk_core.seq, seq));
>
>
> So the first thing which needs to be figured out is WHY this results in a
> SAR
> on your compiler.
>
>> There is another problem with the tkr->read returning an unchanging,
>> unclearable number when this bug occurs for the delta value.  I
>> appears for whatever reason the clock has gone to sleep or gone away
>> and is no longer updating its counters.
>>
>> static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
>> {
>> 	cycle_t cycle_now, delta;
>>
>> 	/* read clocksource */
>> 	cycle_now = tkr->read(tkr->clock); << returns the same value after
>> this bug happens
>>
>> 	/* calculate the delta since the last update_wall_time */
>> 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); <<
>> cycle last is also the same value.
>>
>> 	return delta;
>> }
>
> If that value does not change, then the timekeeping update is not
> running. That might happen because the timer interrupt is not happening or
> whatever got wreckaged.
>
>> I would check how these structs are defined and the vars in them to
>> see if somewhere they are declared as signed values to the compiler,
>> because that's what it thinks it was given to compile.
>
> Sure. Here you go:
>
> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>
> delta, mult, xtime_nsec and shift are unsigned. The only signed value is
> nsec.
>
> Does that issue go away if you apply the patch below?
>
> Thanks,
>
> 	tglx
>
> 8<-----------
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 34b4cedfa80d..d405bcdf9d40 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -301,7 +301,7 @@ static inline u32 arch_gettimeoffset(void) { return 0;
> }
>  static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>  {
>  	cycle_t delta;
> -	s64 nsec;
> +	u64 nsec;
>
>  	delta = timekeeping_get_delta(tkr);
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ