linux-kernel - Re: [RFC] KVM: x86: Allow userspace exit on HLT and MWAIT, else yield on MWAIT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d3e0c3e9-4994-4808-a8df-3d23487ff9c4@amazon.de>
Date:   Sat, 23 Sep 2023 18:43:29 +0200
From:   Alexander Graf <graf@...zon.de>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        David Woodhouse <dwmw2@...radead.org>
CC:     <kvm@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>,
        "Sean Christopherson" <seanjc@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, <linux-kernel@...r.kernel.org>,
        Nicolas Saenz Julienne <nsaenz@...zon.es>,
        "Griffoul, Fred" <fgriffo@...zon.com>
Subject: Re: [RFC] KVM: x86: Allow userspace exit on HLT and MWAIT, else yield
 on MWAIT

On 23.09.23 11:24, Paolo Bonzini wrote:
>
> On 9/23/23 09:22, David Woodhouse wrote:
>> On Fri, 2023-09-22 at 14:00 +0200, Paolo Bonzini wrote:
>>> To avoid races you need two flags though; there needs to be also a
>>> kernel->userspace communication of whether the vCPU is currently in
>>> HLT or MWAIT, using the "flags" field for example. If it was HLT only,
>>> moving the mp_state in kvm_run would seem like a good idea; but not if
>>> MWAIT or PAUSE are also included.
>>
>> Right. When work is added to an empty workqueue, the VMM will want to
>> hunt for a vCPU which is currently idle and then signal it to exit.
>>
>> As you say, for HLT it's simple enough to look at the mp_state, and we
>> can move that into kvm_run so it doesn't need an ioctl...
>
> Looking at it again: not so easy because the mpstate is changed in the
> vCPU thread by vcpu_block() itself.
>
>> although it
>> would also be nice to get an *event* on an eventfd when the vCPU
>> becomes runnable (as noted, we want that for VSM anyway). Or perhaps
>> even to be able to poll() on the vCPU fd.
>
> Why do you need it?  You can just use KVM_RUN to go to sleep, and if you
> get another job you kick out the vCPU with pthread_kill.  (I also didn't
> get the VSM reference).

With the original VSM patches, we used to make a vCPU aware of the fact 
that it can morph into one of many VTLs. That approach turned out to be 
insanely intrusive and fragile and so we're currently reimplementing 
everything as VTLs as vCPUs. That allows us to move the majority of VSM 
functionality to user space. Everything we've seen so far looks as if 
there is no real performance loss with that approach.

One small problem with that is that now user space is responsible for 
switching between VTLs: It determines which VTL is currently running and 
leaves all others (read: all other vCPUs) as stopped. That means if you 
are running happily in KVM_RUN in VTL0 and VTL1 gets an interrupt, user 
space needs to stop VTL0 and unpause VTL1 until it triggers VTL_RETURN 
at which point VTL1 stops execution and VTL0 runs again.

Nicolas built a patch that exposes "interrupt on vCPU is pending" as an 
ioeventfd user space can request. That way, user space can know whenever 
a currently paused vCPU has a pending interrupt and can act accordingly. 
You could use the same mechanism if you wanted to implement HLT in user 
space, but still use an in-kernel LAPIC.

Alex

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879