lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.11.1601201350230.3575@nanos>
Date:	Wed, 20 Jan 2016 15:26:58 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Jeff Merkey <linux.mdb@...il.com>
cc:	LKML <linux-kernel@...r.kernel.org>,
	John Stultz <john.stultz@...aro.org>
Subject: Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup

Jeff,

On Wed, 20 Jan 2016, Thomas Gleixner wrote:
> On Tue, 19 Jan 2016, Jeff Merkey wrote:
> > Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
> > gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
> 
> And how exactly does that happen?
> 
> 0x17AE7F57C671EA7D = 1.70644e+18  nsec
> 		   = 1.70644e+09  sec
> 		   = 2.84407e+07  min
> 		   = 474011	  hrs
> 		   = 19750.5	  days
> 		   = 54.1109	  years
> 
> That's the real issue, not what you are trying to 'fix' in timespec_add_ns()

And that's caused by stopping the whole machine for 20 minutes. It violates
the assumption of the timekeeping core, that the maximum time which is between
two updates of the core is < 5-10min. So that insane large number is caused by a
mult overrun when converting the time delta to nanoseconds.

You can find that limit via:

# dmesg | grep tsc | grep max_idle_ns
[    5.242683] clocksource tsc: mask: 0xffffffffffffffff max_cycles: 0x21139a22526, max_idle_ns: 440795252169 ns

So on that machine the limit is:

   440795252169 nsec
   440.795	sec
   7.34659	min

And before you ask or come up with patches: No, we are not going to add
anything to the core timekeeping code to work around this limitation simply
because its going to add overhead to a performance sensitive code path for a
very limited value.

Keeping a machine stopped for 20 minutes will make a lot of other things
unhappy, so introducing a 'fix' for that particular issue is just silly.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ