[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1607068679.lfd133za4h.astroid@bobo.none>
Date: Fri, 04 Dec 2020 18:12:58 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
Cc: "christophe.leroy@...roup.eu" <christophe.leroy@...roup.eu>,
"hch@...radead.org" <hch@...radead.org>,
"Jonathan.Cameron@...wei.com" <Jonathan.Cameron@...wei.com>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
"lizefan@...wei.com" <lizefan@...wei.com>
Subject: Re: [PATCH v8 11/12] mm/vmalloc: Hugepage vmalloc mappings
Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21 am:
> On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote:
>> Support huge page vmalloc mappings. Config option
>> HAVE_ARCH_HUGE_VMALLOC
>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>> supports PMD sized vmap mappings.
>>
>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD
>> size
>> or larger, and fall back to small pages if that was unsuccessful.
>>
>> Allocations that do not use PAGE_KERNEL prot are not permitted to use
>> huge pages, because not all callers expect this (e.g., module
>> allocations vs strict module rwx).
>
> Several architectures (x86, arm64, others?) allocate modules initially
> with PAGE_KERNEL and so I think this test will not exclude module
> allocations in those cases.
Ah, thanks. I guess archs must additionally ensure that their
PAGE_KERNEL allocations are suitable for huge page mappings before
enabling the option.
If there is interest from those archs to support this, I have an
early (un-posted) patch that adds an explicit VM_HUGE flag that could
override the pessemistic arch default. It's not much trouble to add this
to the large system hash allocations. It's very out of date now but I
can at least give what I have to anyone doing an arch support that
wants it.
>
> [snip]
>
>> @@ -2400,6 +2453,7 @@ static inline void set_area_direct_map(const
>> struct vm_struct *area,
>> {
>> int i;
>>
>> + /* HUGE_VMALLOC passes small pages to set_direct_map */
>> for (i = 0; i < area->nr_pages; i++)
>> if (page_address(area->pages[i]))
>> set_direct_map(area->pages[i]);
>> @@ -2433,11 +2487,12 @@ static void vm_remove_mappings(struct
>> vm_struct *area, int deallocate_pages)
>> * map. Find the start and end range of the direct mappings to
>> make sure
>> * the vm_unmap_aliases() flush includes the direct map.
>> */
>> - for (i = 0; i < area->nr_pages; i++) {
>> + for (i = 0; i < area->nr_pages; i += 1U << area->page_order) {
>> unsigned long addr = (unsigned long)page_address(area-
>> >pages[i]);
>> if (addr) {
>> + unsigned long page_size = PAGE_SIZE << area-
>> >page_order;
>> start = min(addr, start);
>> - end = max(addr + PAGE_SIZE, end);
>> + end = max(addr + page_size, end);
>> flush_dmap = 1;
>> }
>> }
>
> The logic around this is a bit tangled. The reset of the direct map has
> to succeed, but if the set_direct_map_() functions require a split they
> could fail. For x86, set_memory_ro() calls on a vmalloc alias will
> mirror the page size and permission on the direct map and so the direct
> map will be broken to 4k pages if it's a RO vmalloc allocation.
>
> But after this, module vmalloc()'s could have large pages which would
> result in large RO pages on the direct map. Then it could possibly fail
> when trying to reset a 4k page out of a large RO direct map mapping.
>
> I think either module allocations need to be actually excluded from
> having large pages (seems like you might have seen other issues as
> well?), or another option could be to use the changes here:
> https://lore.kernel.org/lkml/20201125092208.12544-4-rppt@kernel.org/
> to reset the direct map for a large page range at a time for large
> vmalloc pages.
>
Right, x86 would have to do something about that before enabling.
A VM_HUGE flag might be quick and easy but maybe other options are not
too difficult.
Thanks,
Nick
Powered by blists - more mailing lists