linux-kernel - Re: [RFC 00/60] Coscheduling for Linux

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180917133703.GU24124@hirez.programming.kicks-ass.net>
Date:   Mon, 17 Sep 2018 15:37:03 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Jan H. Schönherr <jschoenh@...zon.de>
Cc:     Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
        Paul Turner <pjt@...gle.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: [RFC 00/60] Coscheduling for Linux

On Fri, Sep 14, 2018 at 06:25:44PM +0200, Jan H. Schönherr wrote:
> On 09/14/2018 01:12 PM, Peter Zijlstra wrote:

> >> 1. Execute parallel applications that rely on active waiting or synchronous
> >>    execution concurrently with other applications.
> >>
> >>    The prime example in this class are probably virtual machines. Here,
> >>    coscheduling is an alternative to paravirtualized spinlocks, pause loop
> >>    exiting, and other techniques with its own set of advantages and
> >>    disadvantages over the other approaches.
> > 
> > Note that in order to avoid PLE and paravirt spinlocks and paravirt
> > tlb-invalidate you have to gang-schedule the _entire_ VM, not just SMT
> > siblings.
> > 
> > Now explain to me how you're going to gang-schedule a VM with a good
> > number of vCPU threads (say spanning a number of nodes) and preserving
> > the rest of CFS without it turning into a massive trainwreck?
> 
> You probably don't -- for the same reason, why it is a bad idea to give
> an endless loop realtime priority. It's just a bad idea. As I said in the
> text you quoted: coscheduling comes with its own set of advantages and
> disadvantages. Just because you find one example, where it is a bad idea,
> doesn't make it a bad thing in general.
> 
> 
> > Such things (gang scheduling VMs) _are_ possible, but not within the
> > confines of something like CFS, they are also fairly inefficient
> > because, as you do note, you will have to explicitly schedule idle time
> > for idle vCPUs.
> 
> With gang scheduling as defined by Feitelson and Rudolph [6], you'd have to
> explicitly schedule idle time. With coscheduling as defined by Ousterhout [7],
> you don't. In this patch set, the scheduling of idle time is "merely" a quirk
> of the implementation. And even with this implementation, there's nothing
> stopping you from down-sizing the width of the coscheduled set to take out
> the idle vCPUs dynamically, cutting down on fragmentation.

The thing is, if you drop the full width gang scheduling, you instantly
require the paravirt spinlock / tlb-invalidate stuff again.

Of course, the constraints of L1TF itself requires the explicit
scheduling of idle time under a bunch of conditions.

I did not read your [7] in much detail (also very bad quality scan that
:-/; but I don't get how they leap from 'thrashing' to co-scheduling.
Their initial problem, where A generates data that B needs and the 3
scenarios:

 1) A has to wait for B
 2) B has to wait for A
 3) the data gets buffered

Seems fairly straight forward and is indeed quite common, needing
co-scheduling for that, I'm not convinced.

We have of course added all sorts of adaptive wait loops in the kernel
to deal with just that issue.

With co-scheduling you 'ensure' B is running when A is, but that doesn't
mean you can actually make more progress, you could just be burning a
lot of CPu cycles (which could've been spend doing other work).

I'm also not convinced co-scheduling makes _any_ sense outside SMT --
does one of the many papers you cite make a good case for !SMT
co-scheduling? It just doesn't make sense to co-schedule the LLC domain,
that's 16+ cores on recent chips.