lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=WkbGQ_F93rA6y7nPHcJ44UboH9vi_LiToCFeyL2b6btA@mail.gmail.com>
Date:	Thu, 8 May 2014 21:41:27 -0700
From:	Doug Anderson <dianders@...omium.org>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
Cc:	Nicolas Pitre <nicolas.pitre@...aro.org>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Will Deacon <will.deacon@....com>,
	John Stultz <john.stultz@...aro.org>,
	David Riley <davidriley@...omium.org>,
	"olof@...om.net" <olof@...om.net>,
	Sonny Rao <sonnyrao@...omium.org>,
	Richard Zhao <richard.zhao@...aro.org>,
	Santosh Shilimkar <santosh.shilimkar@...com>,
	Shawn Guo <shawn.guo@...aro.org>,
	Stephen Boyd <sboyd@...eaurora.org>,
	Marc Zyngier <marc.zyngier@....com>,
	Stephen Warren <swarren@...dia.com>,
	Paul Gortmaker <paul.gortmaker@...driver.com>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Mark Brown <broonie@...aro.org>
Subject: Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems

Russell,

On Thu, May 8, 2014 at 5:23 PM, Russell King - ARM Linux
<linux@....linux.org.uk> wrote:
> On Thu, May 08, 2014 at 05:02:02PM -0700, Doug Anderson wrote:
>> Russel,
>>
>> On Thu, May 8, 2014 at 1:55 PM, Russell King - ARM Linux
>> <linux@....linux.org.uk> wrote:
>> > On Thu, May 08, 2014 at 11:06:24AM -0700, Doug Anderson wrote:
>> >> I guess I would say that my patch is unhacking the this code.  The
>> >> code after my patch is simpler.  I would perhaps argue that (ec971ea
>> >> ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp)
>> >> should never have landed to begin with.
>> >
>> > That depends on your point of view.  As I've already pointed out through
>> > the examples of why udelay() is inaccurate, for driver authors, they
>> > should assume that udelay() just gives you an "approximate" delay and
>> > it has no accuracy.
>>
>> That disagrees with what Thomas Gleixner says at
>> <http://lkml.iu.edu//hypermail/linux/kernel/1203.1/01034.html>.  It
>> also seems like perhaps the regulator core is broken, then...  If a
>> udelay(30) can end up as a udelay(20) then we may return from a
>> regulator code 10us earlier than we should and we'll assume that a
>> regulator is ramped before it really is...
>>
>> I'm out tomorrow but I can confirm on Monday that I was really seeing
>> udelay(30) be a udelay(20) without this patch.
>
> Thomas is wrong - when I researched this topic, I ended up finding
> that udelay() does delay _less_ than requested, and I mailed Linus
> about it...  This is whe way udelay() is - it's an approximate delay,
> it's *not* accurate.
>
> It's also fairly obvious when you stop and consider how it's calibrated.
>
> Take a moment to wonder when we used to recalibrate each CPU individally,
> why the boot CPU bogomips would be slightly lower than the secondary
> CPU bogomips.  This is all down to the boot CPU having to run timer
> interrupts while the secondary CPUs weren't at the time they calibrated.
>
> Linus doesn't give a damn about udelay() being slightly short in this
> way.  Neither do I.

Can you define what you mean by "slightly"?  I'm personally not super
concerned by udelay(50) becoming udelay(49), but...

I've got a test that hammers on udelay and tests it against ktime.  It
runs for _multiple minutes_ and give nearly spot on udelay(50) values.
 Then, it gets unlucky and gives something like this (values reported
in nanoseconds):

[  467.034522] 0. want=50000, got=308708
[  467.034522] 1. want=50000, got=51917
[  467.034522] 2. want=50000, got=51167
[  467.034522] 3. want=50000, got=76291
[  467.034522] 4. want=50000, got=149959
[  467.034522] 5. want=50000, got=76458
[  467.034522] 6. want=50000, got=39583
[  467.034522] 7. want=50000, got=19459
[  467.034522] 8. want=50000, got=20625
[  467.034522] 9. want=50000, got=20375

It's things like "want 50000, got 19459" that concern me.  It's even
more concerning that it only happens after multiple minutes, meaning
that it will lead to random, impossible to reproduce bugs.

NOTE: I haven't traced through to figure out exactly what scenario
causes the above to happen.  In my case I'm simply looping over some
test code using
<https://chromium-review.googlesource.com/#/c/189760/8/init/calibrate.c>.
 ...but if we stop screwing with loops_per_jiffy it definitely doesn't
happen.


> Let me repeat again: use a timer.

Totally agree.  We are using a timer going forward.  I was posting
this patch to help those less fortunate who happen to be running on a
system where the timer isn't available for whatever reason.

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ