linux-kernel - Re: [RFC][PATCH 00/16] sched: Core scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6c2a697a-cb2b-2061-b791-21b768bea3db@redhat.com>
Date:   Thu, 7 Mar 2019 23:06:33 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>, Greg Kerr <greg@...rnel.com>
Cc:     Greg Kerr <kerrnel@...gle.com>, mingo@...nel.org,
        tglx@...utronix.de, Paul Turner <pjt@...gle.com>,
        tim.c.chen@...ux.intel.com, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, subhra.mazumdar@...cle.com,
        fweisbec@...il.com, keescook@...omium.org
Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling

On 22/02/19 15:10, Peter Zijlstra wrote:
>> I agree on not bike shedding about the API, but can we agree on some of
>> the high level properties? For example, who generates the core
>> scheduling ids, what properties about them are enforced, etc.?
> It's an opaque cookie; the scheduler really doesn't care. All it does is
> ensure that tasks match or force idle within a core.
> 
> My previous patches got the cookie from a modified
> preempt_notifier_register/unregister() which passed the vcpu->kvm
> pointer into it from vcpu_load/put.
> 
> This auto-grouped VMs. It was also found to be somewhat annoying because
> apparently KVM does a lot of userspace assist for all sorts of nonsense
> and it would leave/re-join the cookie group for every single assist.
> Causing tons of rescheduling.

KVM doesn't do _that much_ userspace exiting in practice when VMs are
properly configured (if they're not, you probably don't care about core
scheduling).

However, note that KVM needs core scheduling groups to be defined at the
thread level; one group per process is not enough.  A VM has a bunch of
I/O threads and vCPU threads, and we want to set up core scheduling like
this:

+--------------------------------------+
| VM 1   iothread1  iothread2          |
| +----------------+-----------------+ |
| | vCPU0  vCPU1   |   vCPU2  vCPU3  | |
| +----------------+-----------------+ |
+--------------------------------------+

+--------------------------------------+
| VM 1   iothread1  iothread2          |
| +----------------+-----------------+ |
| | vCPU0  vCPU1   |   vCPU2  vCPU3  | |
| +----------------+-----------------+ |
| | vCPU4  vCPU5   |   vCPU6  vCPU7  | |
| +----------------+-----------------+ |
+--------------------------------------+

where the iothreads need not be subject to core scheduling but the vCPUs
do.  If you don't place guest-sibling vCPUs in the same core scheduling
group, bad things happen.

The reason is that the guest might also be running a core scheduler, so
you could have:

- guest process 1 registering two threads A and B in the same group

- guest process 2 registering two threads C and D in the same group

- guest core scheduler placing thread A on vCPU0, thread B on vCPU1,
thread C on vCPU2, thread D on vCPU3

- host core scheduler deciding the four threads can be in physical cores
0-1, but physical core 0 gets A+C and physical core 1 gets B+D

- now process 2 shares cache with process 1. :(

Paolo