linux-kernel - Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CALCETrUsc9QaP+kxuQo3CoK_=kf3pwnzBHo19Y7yw0ypYF3xkA@mail.gmail.com>
Date:   Fri, 14 Jul 2017 15:26:44 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Vitaly Kuznetsov <vkuznets@...hat.com>
Cc:     Andy Lutomirski <luto@...nel.org>, devel@...uxdriverproject.org,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Jork Loeser <Jork.Loeser@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        X86 ML <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush

On Thu, Jul 13, 2017 at 5:46 AM, Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
> Andy Lutomirski <luto@...nel.org> writes:
>
>> On Tue, May 23, 2017 at 5:36 AM, Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
>>> Andy Lutomirski <luto@...nel.org> writes:
>>>
>>>>
>>>> Also, can you share the benchmark you used for these patches?
>>>
>>> I didn't do much while writing the patchset, mostly I was running the
>>> attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
>>> Hyper-V 2016 guest I get the following (just re-did the test with
>>> 4.12-rc1):
>>>
>>> Before the patchset:
>>> # time ./pthread_mmap ./randfile
>>>
>>> real    3m33.118s
>>> user    0m3.698s
>>> sys     3m16.624s
>>>
>>> After the patchset:
>>> # time ./pthread_mmap ./randfile
>>>
>>> real    2m19.920s
>>> user    0m2.662s
>>> sys     2m9.948s
>>>
>>> K. Y.'s guys at Microsoft did additional testing for the patchset on
>>> different Hyper-V deployments including Azure, they may share their
>>> findings too.
>>
>> I ran this benchmark on my big TLB patchset, mainly to make sure I
>> didn't regress your test.  I seem to have sped it up by 30% or so
>> instead.  I need to study this a little bit to figure out why to make
>> sure that the reason isn't that I'm failing to do flushes I need to
>> do.
>
> Got back to this and tested everything on WS2016 Hyper-V guest (24
> vCPUs) with my slightly modified benchmark. The numbers are:
>
> 1) pre-patch:
>
> real    1m15.775s
> user    0m0.850s
> sys     1m31.515s
>
> 2) your 'x86/pcid' series (PCID feature is not passed to the guest so this
> is mainly your lazy tlb optimization):
>
> real    0m55.135s
> user    0m1.168s
> sys     1m3.810s
>
> 3) My 'pv tlb shootdown' patchset on top of your 'x86/pcid' series:
>
> real    0m48.891s
> user    0m1.052s
> sys     0m52.591s
>
> As far as I understand I need to add
> 'setup_clear_cpu_cap(X86_FEATURE_PCID)' to my series to make things work
> properly if this feature appears in the guest.
>
> Other than that there is an additional room for optimization:
> tlb_single_page_flush_ceiling, I'm not sure that with Hyper-V's PV the
> default value of 33 is optimal. But the investigation can be done
> separately.
>
> AFAIU with your TLB preparatory work which got into 4.13 our series
> become untangled and can go through different trees. I'll rebase mine
> and send it to K. Y. to push through Greg's char-misc tree.
>
> Is there anything blocking your PCID series from going into 4.14? It
> seems to big a huge improvement for some workloads.

No.  All but one patch should land in 4.13.

It would also be nifty if someone were to augment by work to allow one
CPU to tell another CPU that it just flushed on that CPU's behalf.
Basically, a property atomic and/or locked operation that finds a
given ctx_id in the remote cpu's cpu_tlbstate and, if tlb_gen <= x,
sets tlb_gen to x.  Some read operations might be useful, too.  This
*might* be doable with cmpxchg16b, but spinlocks would be easier.  The
idea would be for paravirt remote flushes to be able to see, for real,
which remote CPUs need flushes, do the flushes, and then update the
remote tlb_gen to record that they've been done.

FWIW, I read the HV TLB docs, and it's entirely unclear to me how it
interacts with PCID or whether PCID is supported at all.  It would be
real nice to get PCID *and* paravirt flush on the major hypervisor
platforms.

--Andy