linux-kernel - Re: [Fwd: [RFC PATCH 2/4] KVM: x86: Reduce retpoline performance impact in slot_handle_level

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Date:   Thu, 8 Feb 2018 12:28:05 +0000
From:   "Sironi, Filippo" <sironi@...zon.de>
To:     David Woodhouse <dwmw2@...radead.org>
CC:     "tglx@...utronix.de" <tglx@...utronix.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "x86@...nel.org" <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "bp@...en8.de" <bp@...en8.de>,
        Peter Zijlstra <peterz@...radead.org>,
        "tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
        "dave.hansen@...el.com" <dave.hansen@...el.com>,
        "arjan.van.de.ven@...el.com" <arjan.van.de.ven@...el.com>,
        KVM list <kvm@...r.kernel.org>
Subject: Re: [Fwd: [RFC PATCH 2/4] KVM: x86: Reduce retpoline performance
 impact in slot_handle_level_range()]


> On 8. Feb 2018, at 13:17, David Woodhouse <dwmw2@...radead.org> wrote:
> 
> 
> From: David Woodhouse <dwmw@...zon.co.uk>
> Subject: [RFC PATCH 2/4] KVM: x86: Reduce retpoline performance impact in slot_handle_level_range()
> Date: 7. February 2018 at 01:03:12 GMT+1
> To: tglx@...utronix.de, torvalds@...ux-foundation.org, x86@...nel.org, linux-kernel@...r.kernel.org, bp@...en8.de, peterz@...radead.org, tim.c.chen@...ux.intel.com, dave.hansen@...el.com, arjan.van.de.ven@...el.com
> 
> 
> With retpoline, tight loops of "call this function for every XXX" are
> very much pessimised by taking a prediction miss *every* time. This one
> showed up very high in our early testing.
> 
> By marking the iterator slot_handle_…() functions always_inline, we can
> ensure that the indirect function call can be optimised away into a
> direct call and it actually generates slightly smaller code because
> some of the other conditionals can get optimised away too.
> 
> Suggested-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
> ---
> arch/x86/kvm/mmu.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2b8eb4d..cc83bdc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -5058,7 +5058,7 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
> typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
> 
> /* The caller should hold mmu-lock before calling this function. */
> -static bool
> +static __always_inline bool
> slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
> 			slot_level_handler fn, int start_level, int end_level,
> 			gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
> @@ -5088,7 +5088,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
> 	return flush;
> }
> 
> -static bool
> +static __always_inline bool
> slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
> 		  slot_level_handler fn, int start_level, int end_level,
> 		  bool lock_flush_tlb)
> @@ -5099,7 +5099,7 @@ slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
> 			lock_flush_tlb);
> }
> 
> -static bool
> +static __always_inline bool
> slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
> 		      slot_level_handler fn, bool lock_flush_tlb)
> {
> @@ -5107,7 +5107,7 @@ slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
> 				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
> }
> 
> -static bool
> +static __always_inline bool
> slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
> 			slot_level_handler fn, bool lock_flush_tlb)
> {
> @@ -5115,7 +5115,7 @@ slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
> 				 PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
> }
> 
> -static bool
> +static __always_inline bool
> slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
> 		 slot_level_handler fn, bool lock_flush_tlb)
> {
> -- 
> 2.7.4

+kvm@...r.kernel.org

With this patch, launches of "large instances" are pretty close to what we see with
nospectre_v2 (within tens of milliseconds).

Reviewed-by: Filippo Sironi <sironi@...zon.de>
Tested-by: Filippo Sironi <sironi@...zon.de>

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B