linux-kernel - Re: [PATCH 6/9] KVM: x86: Provide paravirtualized flush_tlb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <A52332CE-80A2-4705-ABB0-3CDDB7AEC889@vmware.com>
Date:   Wed, 26 Jun 2019 06:30:50 +0000
From:   Nadav Amit <namit@...are.com>
To:     Andy Lutomirski <luto@...nel.org>
CC:     Dave Hansen <dave.hansen@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        the arch/x86 maintainers <x86@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH 6/9] KVM: x86: Provide paravirtualized flush_tlb_multi()

> On Jun 25, 2019, at 8:56 PM, Andy Lutomirski <luto@...nel.org> wrote:
> 
> On Tue, Jun 25, 2019 at 8:41 PM Nadav Amit <namit@...are.com> wrote:
>>> On Jun 25, 2019, at 8:35 PM, Andy Lutomirski <luto@...nel.org> wrote:
>>> 
>>> On Tue, Jun 25, 2019 at 7:39 PM Nadav Amit <namit@...are.com> wrote:
>>>>> On Jun 25, 2019, at 2:40 PM, Dave Hansen <dave.hansen@...el.com> wrote:
>>>>> 
>>>>> On 6/12/19 11:48 PM, Nadav Amit wrote:
>>>>>> Support the new interface of flush_tlb_multi, which also flushes the
>>>>>> local CPU's TLB, instead of flush_tlb_others that does not. This
>>>>>> interface is more performant since it parallelize remote and local TLB
>>>>>> flushes.
>>>>>> 
>>>>>> The actual implementation of flush_tlb_multi() is almost identical to
>>>>>> that of flush_tlb_others().
>>>>> 
>>>>> This confused me a bit.  I thought we didn't support paravirtualized
>>>>> flush_tlb_multi() from reading earlier in the series.
>>>>> 
>>>>> But, it seems like that might be Xen-only and doesn't apply to KVM and
>>>>> paravirtualized KVM has no problem supporting flush_tlb_multi().  Is
>>>>> that right?  It might be good to include some of that background in the
>>>>> changelog to set the context.
>>>> 
>>>> I’ll try to improve the change-logs a bit. There is no inherent reason for
>>>> PV TLB-flushers not to implement their own flush_tlb_multi(). It is left
>>>> for future work, and here are some reasons:
>>>> 
>>>> 1. Hyper-V/Xen TLB-flushing code is not very simple
>>>> 2. I don’t have a proper setup
>>>> 3. I am lazy
>>> 
>>> In the long run, I think that we're going to want a way for one CPU to
>>> do a remote flush and then, with appropriate locking, update the
>>> tlb_gen fields for the remote CPU.  Getting this right may be a bit
>>> nontrivial.
>> 
>> What do you mean by “do a remote flush”?
> 
> I mean a PV-assisted flush on a CPU other than the CPU that started
> it.  If you look at flush_tlb_func_common(), it's doing some work that
> is rather fancier than just flushing the TLB.  By replacing it with
> just a pure flush on Xen or Hyper-V, we're losing the potential CR3
> switch and this bit:
> 
>        /* Both paths above update our state to mm_tlb_gen. */
>        this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
> 
> Skipping the former can hurt idle performance, although we should
> consider just disabling all the lazy optimizations on systems with PV
> flush.  (And I've asked Intel to help us out here in future hardware.
> I have no idea what the result of asking will be.)  Skipping the
> cpu_tlbstate write means that we will do unnecessary flushes in the
> future, and that's not doing us any favors.
> 
> In principle, we should be able to do something like:
> 
> flush_tlb_multi(...);
> for(each CPU that got flushed) {
>  spin_lock(something appropriate?);
>  per_cpu_write(cpu, cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, f->new_tlb_gen);
>  spin_unlock(...);
> }
> 
> with the caveat that it's more complicated than this if the flush is a
> partial flush, and that we'll want to check that the ctx_id still
> matches, etc.
> 
> Does this make sense?

Thanks for the detailed explanation. Let me check that I got it right. 

You want to optimize cases in which:

1. A virtual machine

2. Which issues mtultiple (remote) TLB shootdowns

2. To remote vCPU which is preempted by the hypervisor

4. And unlike KVM, the hypervisor does not provide facilities for the VM to
know which vCPU is preempted, and atomically request TLB flush when the vCPU
is scheduled.

Right?