linux-kernel - Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a84836bb-d913-eb58-cd16-b268f479bd8b@huawei.com>
Date:   Tue, 26 Jan 2021 19:48:45 +0800
From:   Ding Tianhong <dingtianhong@...wei.com>
To:     Nicholas Piggin <npiggin@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, <linux-mm@...ck.org>
CC:     Christophe Leroy <christophe.leroy@...roup.eu>,
        Christoph Hellwig <hch@...radead.org>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>,
        <linux-arch@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <linuxppc-dev@...ts.ozlabs.org>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>
Subject: Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings

On 2021/1/26 17:47, Nicholas Piggin wrote:
> Excerpts from Ding Tianhong's message of January 26, 2021 4:59 pm:
>> On 2021/1/26 12:45, Nicholas Piggin wrote:
>>> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
>>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>>> supports PMD sized vmap mappings.
>>>
>>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
>>> or larger, and fall back to small pages if that was unsuccessful.
>>>
>>> Architectures must ensure that any arch specific vmalloc allocations
>>> that require PAGE_SIZE mappings (e.g., module allocations vs strict
>>> module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
>>>
>>> When hugepage vmalloc mappings are enabled in the next patch, this
>>> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
>>> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
>>>
>>> This can result in more internal fragmentation and memory overhead for a
>>> given allocation, an option nohugevmalloc is added to disable at boot.
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@...il.com>
>>> ---
>>>  arch/Kconfig            |  11 ++
>>>  include/linux/vmalloc.h |  21 ++++
>>>  mm/page_alloc.c         |   5 +-
>>>  mm/vmalloc.c            | 215 +++++++++++++++++++++++++++++++---------
>>>  4 files changed, 205 insertions(+), 47 deletions(-)
>>>
>>> diff --git a/arch/Kconfig b/arch/Kconfig
>>> index 24862d15f3a3..eef170e0c9b8 100644
>>> --- a/arch/Kconfig
>>> +++ b/arch/Kconfig
>>> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>  config HAVE_ARCH_HUGE_VMAP
>>>  	bool
>>>  
>>> +#
>>> +#  Archs that select this would be capable of PMD-sized vmaps (i.e.,
>>> +#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
>>> +#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
>>> +#  can be used to prohibit arch-specific allocations from using hugepages to
>>> +#  help with this (e.g., modules may require it).
>>> +#
>>> +config HAVE_ARCH_HUGE_VMALLOC
>>> +	depends on HAVE_ARCH_HUGE_VMAP
>>> +	bool
>>> +
>>>  config ARCH_WANT_HUGE_PMD_SHARE
>>>  	bool
>>>  
>>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>>> index 99ea72d547dc..93270adf5db5 100644
>>> --- a/include/linux/vmalloc.h
>>> +++ b/include/linux/vmalloc.h
>>> @@ -25,6 +25,7 @@ struct notifier_block;		/* in notifier.h */
>>>  #define VM_NO_GUARD		0x00000040      /* don't add guard page */
>>>  #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
>>>  #define VM_MAP_PUT_PAGES	0x00000100	/* put pages and free array in vfree */
>>> +#define VM_NO_HUGE_VMAP		0x00000200	/* force PAGE_SIZE pte mapping */
>>>
>>>  /*
>>>   * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
>>> @@ -59,6 +60,9 @@ struct vm_struct {
>>>  	unsigned long		size;
>>>  	unsigned long		flags;
>>>  	struct page		**pages;
>>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
>>> +	unsigned int		page_order;
>>> +#endif
>>>  	unsigned int		nr_pages;
>>>  	phys_addr_t		phys_addr;
>>>  	const void		*caller;
>> Hi Nicholas:
>>
>> Give a suggestion :)
>>
>> The page order was only used to indicate the huge page flag for vm area, and only valid when
>> size bigger than PMD_SIZE, so can we use the vm flgas to instead of that, just like define the
>> new flag named VM_HUGEPAGE, it would not break the vm struct, and it is easier for me to backport the serious
>> patches to our own branches. (Base on the lts version).
> 
> Hmm, it might be possible. I'm not sure if 1GB vmallocs will be used any 
> time soon (or maybe they will for edge case configurations? It would be 
> trivial to add support for).
> 

1GB vmallocs is really crazy, but maybe used for future. :)

> The other concern I have is that Christophe IIRC was asking about 
> implementing a mapping for PPC which used TLB mappings that were 
> different than kernel page table tree size. Although I guess we could 
> deal with that when it comes.
> 

I didn't check the PPC platform, but a agree with you.

> I like the flexibility of page_order though. How hard would it be for 
> you to do the backport with VM_HUGEPAGE yourself?
> 

Yes, i can fix it with VM_HUGEPAGE for my own branch.

> I should also say, thanks for all the review and testing from the Huawei 
> team. Do you have an x86 patch?
I only enable and use it for x86 and aarch64 platform, this serious patches is
really help us a lot. Thanks.

Ding

> Thanks,
> Nick
> .
>