lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c2a697a-cb2b-2061-b791-21b768bea3db@redhat.com>
Date:   Thu, 7 Mar 2019 23:06:33 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>, Greg Kerr <greg@...rnel.com>
Cc:     Greg Kerr <kerrnel@...gle.com>, mingo@...nel.org,
        tglx@...utronix.de, Paul Turner <pjt@...gle.com>,
        tim.c.chen@...ux.intel.com, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, subhra.mazumdar@...cle.com,
        fweisbec@...il.com, keescook@...omium.org
Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling

On 22/02/19 15:10, Peter Zijlstra wrote:
>> I agree on not bike shedding about the API, but can we agree on some of
>> the high level properties? For example, who generates the core
>> scheduling ids, what properties about them are enforced, etc.?
> It's an opaque cookie; the scheduler really doesn't care. All it does is
> ensure that tasks match or force idle within a core.
> 
> My previous patches got the cookie from a modified
> preempt_notifier_register/unregister() which passed the vcpu->kvm
> pointer into it from vcpu_load/put.
> 
> This auto-grouped VMs. It was also found to be somewhat annoying because
> apparently KVM does a lot of userspace assist for all sorts of nonsense
> and it would leave/re-join the cookie group for every single assist.
> Causing tons of rescheduling.

KVM doesn't do _that much_ userspace exiting in practice when VMs are
properly configured (if they're not, you probably don't care about core
scheduling).

However, note that KVM needs core scheduling groups to be defined at the
thread level; one group per process is not enough.  A VM has a bunch of
I/O threads and vCPU threads, and we want to set up core scheduling like
this:

+--------------------------------------+
| VM 1   iothread1  iothread2          |
| +----------------+-----------------+ |
| | vCPU0  vCPU1   |   vCPU2  vCPU3  | |
| +----------------+-----------------+ |
+--------------------------------------+

+--------------------------------------+
| VM 1   iothread1  iothread2          |
| +----------------+-----------------+ |
| | vCPU0  vCPU1   |   vCPU2  vCPU3  | |
| +----------------+-----------------+ |
| | vCPU4  vCPU5   |   vCPU6  vCPU7  | |
| +----------------+-----------------+ |
+--------------------------------------+

where the iothreads need not be subject to core scheduling but the vCPUs
do.  If you don't place guest-sibling vCPUs in the same core scheduling
group, bad things happen.

The reason is that the guest might also be running a core scheduler, so
you could have:

- guest process 1 registering two threads A and B in the same group

- guest process 2 registering two threads C and D in the same group

- guest core scheduler placing thread A on vCPU0, thread B on vCPU1,
thread C on vCPU2, thread D on vCPU3

- host core scheduler deciding the four threads can be in physical cores
0-1, but physical core 0 gets A+C and physical core 1 gets B+D

- now process 2 shares cache with process 1. :(

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ