lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ccad240e-d604-60e2-b12d-c7c3ca530887@grsecurity.net>
Date:   Fri, 27 Jan 2023 17:15:44 +0100
From:   Mathias Krause <minipli@...ecurity.net>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH 3/3] KVM: x86: do not unload MMU roots when only toggling
 CR0.WP

On 18.01.23 11:17, Mathias Krause wrote:
> On 17.01.23 22:29, Sean Christopherson wrote:
>> On Tue, Jan 17, 2023, Mathias Krause wrote:
>>> [...] 
>>> Change kvm_mmu_reset_context() to get passed the need for unloading MMU
>>> roots and explicitly avoid it if only CR0.WP was toggled on a CR0 write
>>> caused VMEXIT.
>>
>> One thing we should explore on top of this is not intercepting CR0.WP (on Intel)
>> when TDP is enabled.  It could even trigger after toggling CR0.WP N times, e.g.
>> to optimize the grsecurity use case without negatively impacting workloads with
>> a static CR0.WP, as walking guest memory would require an "extra" VMREAD to get
>> CR0.WP in that case.
> 
> That would be even better, agreed. I'll look into it and will try to
> come up with something.

I looked into it and we can gain quite a few more cycles from this, e.g.
the runtime for the 'ssdd 10 50000' test running with TDP MMU takes
another bump from 7.31s down to 4.89s. That's overall 2.8 times faster
than the 13.91s we started with. :)

I'll cook up a patch next week and send a v3 series with some more
cleanups I collected in the meantime.

>> Unfortunately, AMD doesn't provide per-bit controls.

Meanwhile I got my hands on an AMD system and it gains from this series
as well, not as much as my Intel system, though. We go down from 5.8s to
4.12s for the 'ssdd 10 50000' test with TDP MMU enabled -- a nearly 30%
runtime reduction.

>>> This change brings a huge performance gain as the following micro-
>>> benchmark running 'ssdd 10 50000' from rt-tests[1] on a grsecurity L1 VM
>>> shows (runtime in seconds, lower is better):
>>>
>>>                       legacy MMU   TDP MMU
>>> kvm.git/queue             11.55s    13.91s
>>> kvm.git/queue+patch        7.44s     7.94s
>>>
>>> For legacy MMU this is ~35% faster, for TTP MMU ~43% faster.
>>>
>>> [1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
>>>
>>> Signed-off-by: Mathias Krause <minipli@...ecurity.net>
>>> ---
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ