lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <817e2efa-6154-a1c6-8ec7-de04324d6df8@oracle.com>
Date:   Wed, 16 Aug 2017 17:41:12 -0400
From:   Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:     Vitaly Kuznetsov <vkuznets@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Jork Loeser <Jork.Loeser@...rosoft.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Simon Xiao <sixiao@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        "luto@...nel.org" <luto@...nel.org>,
        "hpa@...or.com" <hpa@...or.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "andy.shevchenko@...il.com" <andy.shevchenko@...il.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "linux-tip-commits@...r.kernel.org" 
        <linux-tip-commits@...r.kernel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Juergen Gross <jgross@...e.com>
Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB
 flush

On 08/16/2017 12:42 PM, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <vkuznets@...hat.com> writes:
>
>> Peter Zijlstra <peterz@...radead.org> writes:
>>
>>> On Fri, Aug 11, 2017 at 09:16:29AM -0700, Linus Torvalds wrote:
>>>> On Fri, Aug 11, 2017 at 2:03 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>>>>> I'm sure we talked about using HAVE_RCU_TABLE_FREE for x86 (and yes that
>>>>> would make it work again), but this was some years ago and I cannot
>>>>> readily find those emails.
>>>> I think the only time we really talked about HAVE_RCU_TABLE_FREE for
>>>> x86 (at least that I was cc'd on) was not because of RCU freeing, but
>>>> because we just wanted to use the generic page table lookup code on
>>>> x86 *despite* not using RCU freeing.
>>>>
>>>> And we just ended up renaming HAVE_GENERIC_RCU_GUP as HAVE_GENERIC_GUP.
>>>>
>>>> There was only passing mention of maybe making x86 use RCU, but the
>>>> discussion was really about why the IF flag meant that x86 didn't need
>>>> to, iirc.
>>>>
>>>> I don't recall us ever discussing *really* making x86 use RCU.
>>> Google finds me this:
>>>
>>>   https://lwn.net/Articles/500188/
>>>
>>> Which includes:
>>>
>>>   http://www.mail-archive.com/kvm@vger.kernel.org/msg72918.html
>>>
>>> which does as was suggested here, selects HAVE_RCU_TABLE_FREE for
>>> PARAVIRT_TLB_FLUSH.
>>>
>>> But yes, this is very much virt specific nonsense, native would never
>>> need this.
>> In case we decide to go HAVE_RCU_TABLE_FREE for all PARAVIRT-enabled
>> kernels (as it seems to be the easiest/fastest way to fix Xen PV) - what
>> do you think about the required testing? Any suggestion for a
>> specifically crafted micro benchmark in addition to standard
>> ebizzy/kernbench/...?
> In the meantime I tested HAVE_RCU_TABLE_FREE with kernbench (enablement
> patch I used is attached; I know that it breaks other architectures) on
> bare metal with PARAVIRT enabled in config. The results are:
>
> 6-CPU host:
>
> Average Half load -j 3 Run (std deviation):
> CURRENT					HAVE_RCU_TABLE_FREE
> =======					===================
> Elapsed Time 400.498 (0.179679)		Elapsed Time 399.909 (0.162853)
> User Time 1098.72 (0.278536)		User Time 1097.59 (0.283894)
> System Time 100.301 (0.201629)		System Time 99.736 (0.196254)
> Percent CPU 299 (0)			Percent CPU 299 (0)
> Context Switches 5774.1 (69.2121)	Context Switches 5744.4 (79.4162)
> Sleeps 87621.2 (78.1093)		Sleeps 87586.1 (99.7079)
>
> Average Optimal load -j 24 Run (std deviation):
> CURRENT					HAVE_RCU_TABLE_FREE
> =======					===================
> Elapsed Time 219.03 (0.652534)		Elapsed Time 218.959 (0.598674)
> User Time 1119.51 (21.3284)		User Time 1118.81 (21.7793)
> System Time 100.499 (0.389308)		System Time 99.8335 (0.251423)
> Percent CPU 432.5 (136.974)		Percent CPU 432.45 (136.922)
> Context Switches 81827.4 (78029.5)	Context Switches 81818.5 (78051)
> Sleeps 97124.8 (9822.4)			Sleeps 97207.9 (9955.04)
>
> 16-CPU host:
>
> Average Half load -j 8 Run (std deviation):
> CURRENT					HAVE_RCU_TABLE_FREE
> =======					===================
> Elapsed Time 213.538 (3.7891)		Elapsed Time 212.5 (3.10939)
> User Time 1306.4 (1.83399)		User Time 1307.65 (1.01364)
> System Time 194.59 (0.864378)		System Time 195.478 (0.794588)
> Percent CPU 702.6 (13.5388)		Percent CPU 707 (11.1131)
> Context Switches 21189.2 (1199.4)	Context Switches 21288.2 (552.388)
> Sleeps 89390.2 (482.325)		Sleeps 89677 (277.06)
>
> Average Optimal load -j 64 Run (std deviation):
> CURRENT					HAVE_RCU_TABLE_FREE
> =======					===================
> Elapsed Time 137.866 (0.787928)		Elapsed Time 138.438 (0.218792)
> User Time 1488.92 (192.399)		User Time 1489.92 (192.135)
> System Time 234.981 (42.5806)		System Time 236.09 (42.8138)
> Percent CPU 1057.1 (373.826)		Percent CPU 1057.1 (369.114)
> Context Switches 187514 (175324)	Context Switches 187358 (175060)
> Sleeps 112633 (24535.5)			Sleeps 111743 (23297.6)
>
> As you can see, there's no notable difference. I'll think of a
> microbenchmark though.

FWIW, I was about to send a very similar patch (but with only Xen-PV
enabling RCU-based free by default) and saw similar results with
kernbench, both Xen PV and baremetal.

>> Additionally, I see another option for us: enable 'rcu table free' on
>> boot (e.g. by taking tlb_remove_table to pv_ops and doing boot-time
>> patching for it) so bare metal and other hypervisors are not affected
>> by the change.
> It seems there's no need for that and we can keep things simple...
>
> -- Vitaly
>
> 0001-x86-enable-RCU-based-table-free-when-PARAVIRT.patch
>
>
> >From daf5117706920aebe793d1239fccac2edd4d680c Mon Sep 17 00:00:00 2001
> From: Vitaly Kuznetsov <vkuznets@...hat.com>
> Date: Mon, 14 Aug 2017 16:05:05 +0200
> Subject: [PATCH] x86: enable RCU based table free when PARAVIRT
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@...hat.com>
> ---
>  arch/x86/Kconfig      |  1 +
>  arch/x86/mm/pgtable.c | 16 ++++++++++++++++
>  mm/memory.c           |  5 +++++
>  3 files changed, 22 insertions(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 781521b7cf9e..9c1666ea04c9 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -167,6 +167,7 @@ config X86
>  	select HAVE_PERF_REGS
>  	select HAVE_PERF_USER_STACK_DUMP
>  	select HAVE_REGS_AND_STACK_ACCESS_API
> +	select HAVE_RCU_TABLE_FREE              if SMP && PARAVIRT
>  	select HAVE_RELIABLE_STACKTRACE		if X86_64 && FRAME_POINTER && STACK_VALIDATION
>  	select HAVE_STACK_VALIDATION		if X86_64
>  	select HAVE_SYSCALL_TRACEPOINTS
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 508a708eb9a6..b6aaab9fb3b8 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -56,7 +56,11 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
>  {
>  	pgtable_page_dtor(pte);
>  	paravirt_release_pte(page_to_pfn(pte));
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +	tlb_remove_table(tlb, pte);
> +#else
>  	tlb_remove_page(tlb, pte);
> +#endif
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 2
> @@ -72,21 +76,33 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  	tlb->need_flush_all = 1;
>  #endif
>  	pgtable_pmd_page_dtor(page);
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +	tlb_remove_table(tlb, page);
> +#else
>  	tlb_remove_page(tlb, page);
> +#endif
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
>  void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
>  {
>  	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +	tlb_remove_table(tlb, virt_to_page(pud));
> +#else
>  	tlb_remove_page(tlb, virt_to_page(pud));
> +#endif
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 4
>  void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
>  {
>  	paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE
> +	tlb_remove_table(tlb, virt_to_page(p4d));
> +#else
>  	tlb_remove_page(tlb, virt_to_page(p4d));
> +#endif

This can probably be factored out.

>  }
>  #endif	/* CONFIG_PGTABLE_LEVELS > 4 */
>  #endif	/* CONFIG_PGTABLE_LEVELS > 3 */
> diff --git a/mm/memory.c b/mm/memory.c
> index e158f7ac6730..18d6671b6ae2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -329,6 +329,11 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_
>   * See the comment near struct mmu_table_batch.
>   */
>  
> +static void __tlb_remove_table(void *table)
> +{
> +	free_page_and_swap_cache(table);
> +}
> +

This needs to be a per-arch routine (e.g. see arch/arm64/include/asm/tlb.h).

-boris


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ