[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200307140616.034b3e537ffd46eb757dec73@linux-foundation.org>
Date: Sat, 7 Mar 2020 14:06:16 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Cannon Matthews <cannonmatthews@...gle.com>
Cc: Mike Kravetz <mike.kravetz@...cle.com>,
Matthew Wilcox <willy@...radead.org>,
Michal Hocko <mhocko@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Greg Thelen <gthelen@...gle.com>,
Salman Qazi <sqazi@...gle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86
On Fri, 6 Mar 2020 17:03:53 -0800 Cannon Matthews <cannonmatthews@...gle.com> wrote:
> Reimplement clear_gigantic_page() to clear gigabytes pages using the
> non-temporal streaming store instructions that bypass the cache
> (movnti), since an entire 1GiB region will not fit in the cache anyway.
>
> Doing an mlock() on a 512GiB 1G-hugetlb region previously would take on
> average 134 seconds, about 260ms/GiB which is quite slow. Using `movnti`
> and optimizing the control flow over the constituent small pages, this
> can be improved roughly by a factor of 3-4x, with the 512GiB mlock()
> taking only 34 seconds on average, or 67ms/GiB.
>
> The assembly code for the __clear_page_nt routine is more or less
> taken directly from the output of gcc with -O3 for this function with
> some tweaks to support arbitrary sizes and moving memory barriers:
>
> void clear_page_nt_64i (void *page)
> {
> for (int i = 0; i < GiB /sizeof(long long int); ++i)
> {
> _mm_stream_si64 (((long long int*)page) + i, 0);
> }
> sfence();
> }
>
> Tested:
> Time to `mlock()` a 512GiB region on broadwell CPU
> AVG time (s) % imp. ms/page
> clear_page_erms 133.584 - 261
> clear_page_nt 34.154 74.43% 67
>
> An earlier version of this code was sent as an RFC patch ~July 2018
> https://patchwork.kernel.org/patch/10543193/ but never merged.
>
> ...
>
> MAINTAINERS | 1 +
> arch/x86/Kconfig | 4 ++++
> arch/x86/include/asm/page_64.h | 1 +
> arch/x86/lib/Makefile | 2 +-
> arch/x86/lib/clear_gigantic_page.c | 28 ++++++++++++++++++++++++++++
> arch/x86/lib/clear_page_64.S | 19 +++++++++++++++++++
> include/linux/mm.h | 2 ++
> mm/memory.c | 2 ++
Please cc the x86 maintainers on such things.
>
> ...
>
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -70,6 +70,7 @@ config X86
> select ARCH_HAS_KCOV if X86_64
> select ARCH_HAS_MEM_ENCRYPT
> select ARCH_HAS_MEMBARRIER_SYNC_CORE
> + select ARCH_HAS_CLEAR_GIGANTIC_PAGE if X86_64
> select ARCH_HAS_PMEM_API if X86_64
> select ARCH_HAS_PTE_DEVMAP if X86_64
> select ARCH_HAS_PTE_SPECIAL
> @@ -290,6 +291,9 @@ config ARCH_MAY_HAVE_PC_FDC
> config GENERIC_CALIBRATE_DELAY
> def_bool y
>
> +config ARCH_HAS_CLEAR_GIGANTIC_PAGE
> + bool
Opinions might differ, but I believe the Linus-approved way of doing
this sort of thing is
#ifndef clear_gigantic_page
extern void clear_gigantic_page(...);
#define clear_gigantic_page clear_gigantic_page
#endif
alternatively, __weak works nicely.
>
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2856,6 +2856,8 @@ enum mf_action_page_type {
> };
>
> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
> +extern void clear_gigantic_page(struct page *page, unsigned long addr,
> + unsigned int pages);
Shouldn't this be inside #ifdef CONFIG_ARCH_HAS_CLEAR_GIGANTIC_PAGE?
Also, the third argument here is called "pages" but here:
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4706,6 +4706,7 @@ static inline void process_huge_page(
> }
> }
>
> +#ifndef CONFIG_ARCH_HAS_CLEAR_GIGANTIC_PAGE
> static void clear_gigantic_page(struct page *page,
> unsigned long addr,
> unsigned int pages_per_huge_page)
it is called "pages_per_huge_page".
> @@ -4720,6 +4721,7 @@ static void clear_gigantic_page(struct page *page,
> clear_user_highpage(p, addr + i * PAGE_SIZE);
> }
> }
> +#endif /* CONFIG_ARCH_HAS_CLEAR_GIGANTIC_PAGE */
>
> ...
Powered by blists - more mailing lists