lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Jul 2020 09:48:57 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Dave Jones <davej@...emonkey.org.uk>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Linux Kernel <linux-kernel@...r.kernel.org>, mingo@...nel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        paul.gortmaker@...driver.com, valentin.schneider@....com
Subject: Re: weird loadavg on idle machine post 5.7

On Mon, Jul 06, 2020 at 05:20:57PM -0400, Dave Jones wrote:
> On Mon, Jul 06, 2020 at 04:59:52PM +0200, Peter Zijlstra wrote:
>  > On Fri, Jul 03, 2020 at 04:51:53PM -0400, Dave Jones wrote:
>  > > On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote:
>  > >  
>  > > looked promising the first few hours, but as soon as it hit four hours
>  > > of uptime, loadavg spiked and is now pinned to at least 1.00
>  > 
>  > OK, lots of cursing later, I now have the below...
>  > 
>  > The TL;DR is that while schedule() doesn't change p->state once it
>  > starts, it does read it quite a bit, and ttwu() will actually change it
>  > to TASK_WAKING. So if ttwu() changes it to WAKING before schedule()
>  > reads it to do loadavg accounting, things go sideways.
>  > 
>  > The below is extra complicated by the fact that I've had to scrounge up
>  > a bunch of load-store ordering without actually adding barriers. It adds
>  > yet another control dependency to ttwu(), so take that C standard :-)
> 
> Man this stuff is subtle. I could've read this a hundred times and not
> even come close to approaching this.
> 
> Basically me reading scheduler code:
> http://www.quickmeme.com/img/96/9642ed212bbced00885592b39880ec55218e922245e0637cf94db2e41857d558.jpg

Heh, that one made me nearly spill my tea, much funnies :-)

But yes, Dave Chinner also complained about this for the previous fix.
I've written this:

  https://lore.kernel.org/lkml/20200703133259.GE4781@hirez.programming.kicks-ass.net/

to help with that. But clearly I'll need to update that patch again
after this little adventure.

>  > I've booted it, and build a few kernels with it and checked loadavg
>  > drops to 0 after each build, so from that pov all is well, but since
>  > I'm not confident I can reproduce the issue, I can't tell this actually
>  > fixes anything, except maybe phantoms of my imagination.
> 
> Five hours in, looking good so far.  I think you nailed it.

\o/ hooray! Thanks for testing Dave!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ