[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21d6fc65-d9d1-66bb-9bea-a4bad78c7aac@csgroup.eu>
Date: Sat, 15 Jan 2022 10:11:18 +0000
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Kefeng Wang <wangkefeng.wang@...wei.com>,
Dave Hansen <dave.hansen@...el.com>,
Jonathan Corbet <corbet@....net>,
Andrew Morton <akpm@...ux-foundation.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"x86@...nel.org" <x86@...nel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>
CC: Nicholas Piggin <npiggin@...il.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>,
Michael Ellerman <mpe@...erman.id.au>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings
Le 28/12/2021 à 11:26, Kefeng Wang a écrit :
>
> On 2021/12/27 23:56, Dave Hansen wrote:
>> On 12/27/21 6:59 AM, Kefeng Wang wrote:
>>> This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE
>>> support huge vmalloc mappings.
>> In general, this seems interesting and the diff is simple. But, I don't
>> see _any_ x86-specific data. I think the bare minimum here would be a
>> few kernel compiles and some 'perf stat' data for some TLB events.
>
> When the feature supported on ppc,
>
> commit 8abddd968a303db75e4debe77a3df484164f1f33
> Author: Nicholas Piggin <npiggin@...il.com>
> Date: Mon May 3 19:17:55 2021 +1000
>
> powerpc/64s/radix: Enable huge vmalloc mappings
>
> This reduces TLB misses by nearly 30x on a `git diff` workload on a
> 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%, due
> to vfs hashes being allocated with 2MB pages.
>
> But the data could be different on different machine/arch.
>
>>> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
>>> index 95fa745e310a..6bf5cb7d876a 100644
>>> --- a/arch/x86/kernel/module.c
>>> +++ b/arch/x86/kernel/module.c
>>> @@ -75,8 +75,8 @@ void *module_alloc(unsigned long size)
>>> p = __vmalloc_node_range(size, MODULE_ALIGN,
>>> MODULES_VADDR + get_module_load_offset(),
>>> - MODULES_END, gfp_mask,
>>> - PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE,
>>> + MODULES_END, gfp_mask, PAGE_KERNEL,
>>> + VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP, NUMA_NO_NODE,
>>> __builtin_return_address(0));
>>> if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) {
>>> vfree(p);
>> To figure out what's going on in this hunk, I had to look at the cover
>> letter (which I wasn't cc'd on). That's not great and it means that
>> somebody who stumbles upon this in the code is going to have a really
>> hard time figuring out what is going on. Cover letters don't make it
>> into git history.
> Sorry for that, will add more into arch's patch changelog.
>> This desperately needs a comment and some changelog material in *this*
>> patch.
>>
>> But, even the description from the cover letter is sparse:
>>
>>> There are some disadvantages about this feature[2], one of the main
>>> concerns is the possible memory fragmentation/waste in some scenarios,
>>> also archs must ensure that any arch specific vmalloc allocations that
>>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
>>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
>> That just says that x86 *needs* PAGE_SIZE allocations. But, what
>> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)? Will the
>> subsequent permission changes just fragment the 2M mapping?
>> .
>
> Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.
>
> When module alloc with STRICT_MODULE_RWX on x86, it calls
> __change_page_attr()
>
> from set_memory_ro/rw/nx which will split large page, so there is no
> need to make
>
> module alloc with HUGE_VMALLOC.
>
Maybe there is no need to perform the module alloc with HUGE_VMALLOC,
but it least it would still work if you do so.
Powerpc did add VM_NO_HUGE_VMAP temporarily and for some reason which is
explained in a comment.
If x86 already has the necessary logic to handle it, why add
VM_NO_HUGE_VMAP ?
Christophe
Powered by blists - more mailing lists