linux-kernel - Re: [PATCH] clocksource, prevent overflow in clocksource

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F7D7B4B.7050203@redhat.com>
Date:	Thu, 05 Apr 2012 07:00:27 -0400
From:	Prarit Bhargava <prarit@...hat.com>
To:	John Stultz <johnstul@...ibm.com>
CC:	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Salman Qazi <sqazi@...gle.com>, stable@...nel.org
Subject: Re: [PATCH] clocksource, prevent overflow in clocksource_cyc2ns



On 04/04/2012 09:08 PM, John Stultz wrote:
> On 04/04/2012 11:33 AM, Prarit Bhargava wrote:
>>> One idea might be to replace the cyc2ns w/ mult_frac in only the watchdog code.
>>> I need to think on that some more (and maybe have you provide some debug output)
>>> to really understand how that's solving the issue for you, but it would be able
>>> to be done w/o affecting the other assumptions of the timekeeping core.
>>>
>> Hey John,
>>
>> After reading the initial part of your reply I was thinking about calling
>> mult_frac() directly from the watchdog code as well.
>>
>> Here's some debug output I cobbled together to get an idea of how quickly the
>> overflow was happening.
>>
>> [    5.435323] clocksource_watchdog: {0} cs tsc csfirst 227349443638728 mask
>> 0xFFFFFFFFFFFFFFFF mult 797281036 shift 31
>> [    5.444930] clocksource_watchdog: {0} wd hpet wdfirst 78332535 mask
>> 0xFFFFFFFF mult 292935555 shift 22
>>
>> These, of course, are just the basic data from the clocksources tsc and hpet.
> 
> If I'm doing the math right, these are ~2.7 Ghz cpus?

Yes.

> 
> So what kernel version are you using?

I was on an earlier version of Fedora (F16) ... but I'll jump forward and see if
I can still hit it.

> 
> In trying to reproduce this locally against Linus' HEAD on a much smaller system
> (single core + HT 1.6Ghz), I got:
> [    6.611366] clocksource_watchdog: {0} cs tsc csfirst 36177888648 mask
> ffffffffffffffff mult 10485747 shift 24
> [    6.611596] clocksource_watchdog: {0} wd hpet wdfirst 169168400 mask ffffffff
> mult 2684354560 shift 26
> 
> Note the smaller shift values. Not too long ago the shift calculation was
> adjusted to allow for longer periods between interrupts,  so I suspect you're on
> an older kernel.
> 
> Further, using your debug patch on my system, it was well beyond 10 minutes
> before the debug overflow occurred.  And similarly I couldn't trip the watchdog
> trigger using sysrq-t (but again, only two threads here, so not nearly as much
> data to print as you have).

I'm going to try this on a 32-cpu system (running the previously mentioned test)
with linux.git HEAD.

> 
> Could you verify that the issue you're seeing is still is present w/ current
> mainline?  Please don't take this as me dismissing your problem!  As I mentioned

Absolutely :)  I didn't take it that way at all. .... when I get in this AM I'll
bang out a test and see if I can cause this to happen with sysrq-t.  Keep in
mind that 10000 threads is the *minimum* I was able to cause this with, which is
only ~315 threads/cpu, which isn't a lot :/.  At that number of threads the dump
takes about 6 mins.  Doubling it, IIRC, exceeded 10 mins.

> earlier there are some known issues w/ the clocksource watchdog code. But I want
> to narrow down if you're  problem  is currently present in mainline or only in
> older kernels, as that will help us find the proper fix.

Thanks John,

P.

> 
> thanks
> -john
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/