linux-kernel - Re: [PATCH 0/1] RFC: sched/fair: skip select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190109180225.GA28624@redhat.com>
Date:   Wed, 9 Jan 2019 13:02:25 -0500
From:   Andrea Arcangeli <aarcange@...hat.com>
To:     Mike Galbraith <efault@....de>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/1] RFC: sched/fair: skip select_idle_sibling() in
 presence of sync wakeups

Hello Mike,

On Wed, Jan 09, 2019 at 05:19:48AM +0100, Mike Galbraith wrote:
> On Tue, 2019-01-08 at 22:49 -0500, Andrea Arcangeli wrote:
> > Hello,
> > 
> > we noticed some unexpected performance regressions in the scheduler by
> > switching the guest CPU topology from "-smp 2,sockets=2,cores=1" to
> > "-smp 2,sockets=1,cores=2".
*snip*
> > To test I used this trivial program.
> 
> Which highlights the problem.  That proggy really is synchronous, but

Note that I wrote the program only after the guest scheduler
regression was reported, purely in order to test the patch and to
reproduce the customer issue more easily (so I could see the effect by
just running top). The regression was reported by a real life customer
workload AFIK and it was caused by the idle balancing dropping the
sync information.

If it was just the lat_ctx type of workload like the program I
attached I wouldn't care either, but this was a localhost udp and tcp
(both bandwidth and latency) test that showed improvement by not
dropping to the sync information through idle core balancing during
wakeups.

There is no tuning to allow people to test the sync information with
real workloads, the only way is to rebuild the kernel with SCHED_MC=n
(which nobody should be doing because it has other drawbacks) or by
altering the vCPU topology. So for now we're working to restore the
standard only-sockets topology to shut off the idle balancing without
having to patch the guest scheduler, but this looked like a more
general problem that has room for improvement.

Ideally we should detect when the sync information is worth keeping
instead of always dropping it. Alternatively a sched_feat could be
added to achieve it manually.

Thanks,
Andrea