[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <440d4a0e-c1ea-864b-54cb-aab74858319a@bytedance.com>
Date: Wed, 26 Jul 2023 16:02:21 +0100
From: Usama Arif <usama.arif@...edance.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: linux-mm@...ck.org, muchun.song@...ux.dev, mike.kravetz@...cle.com,
linux-kernel@...r.kernel.org, fam.zheng@...edance.com,
liangma@...ngbit.com, simon.evans@...edance.com,
punit.agrawal@...edance.com
Subject: Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to
struct memblock_region
On 26/07/2023 12:01, Mike Rapoport wrote:
> On Mon, Jul 24, 2023 at 02:46:42PM +0100, Usama Arif wrote:
>> This propagates the hugepage size from the memblock APIs
>> (memblock_alloc_try_nid_raw and memblock_alloc_range_nid)
>> so that it can be stored in struct memblock region. This does not
>> introduce any functional change and hugepage_size is not used in
>> this commit. It is just a setup for the next commit where huge_pagesize
>> is used to skip initialization of struct pages that will be freed later
>> when HVO is enabled.
>>
>> Signed-off-by: Usama Arif <usama.arif@...edance.com>
>> ---
>> arch/arm64/mm/kasan_init.c | 2 +-
>> arch/powerpc/platforms/pasemi/iommu.c | 2 +-
>> arch/powerpc/platforms/pseries/setup.c | 4 +-
>> arch/powerpc/sysdev/dart_iommu.c | 2 +-
>> include/linux/memblock.h | 8 ++-
>> mm/cma.c | 4 +-
>> mm/hugetlb.c | 6 +-
>> mm/memblock.c | 60 ++++++++++++--------
>> mm/mm_init.c | 2 +-
>> mm/sparse-vmemmap.c | 2 +-
>> tools/testing/memblock/tests/alloc_nid_api.c | 2 +-
>> 11 files changed, 56 insertions(+), 38 deletions(-)
>>
>
> [ snip ]
>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index f71ff9f0ec81..bb8019540d73 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -63,6 +63,7 @@ struct memblock_region {
>> #ifdef CONFIG_NUMA
>> int nid;
>> #endif
>> + phys_addr_t hugepage_size;
>> };
>>
>> /**
>> @@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align,
>> phys_addr_t start, phys_addr_t end);
>> phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
>> phys_addr_t align, phys_addr_t start,
>> - phys_addr_t end, int nid, bool exact_nid);
>> + phys_addr_t end, int nid, bool exact_nid,
>> + phys_addr_t hugepage_size);
>
> Rather than adding yet another parameter to memblock_phys_alloc_range() we
> can have an API that sets a flag on the reserved regions.
> With this the hugetlb reservation code can set a flag when HVO is
> enabled and memmap_init_reserved_pages() will skip regions with this flag
> set.
>
Hi,
Thanks for the review.
I think you meant memblock_alloc_range_nid/memblock_alloc_try_nid_raw
and not memblock_phys_alloc_range?
My initial approach was to use flags, but I think it looks worse than
what I have done in this RFC (I have pushed the flags prototype at
https://github.com/uarif1/linux/commits/flags_skip_prep_init_gigantic_HVO,
top 4 commits for reference (the main difference is patch 2 and 4 from
RFC)). The major points are (the bigger issue is in patch 4):
- (RFC vs flags patch 2 comparison) In the RFC, hugepage_size is
propagated from memblock_alloc_try_nid_raw through function calls. When
using flags, the "no_init" boolean is propogated from
memblock_alloc_try_nid_raw through function calls until the region flags
are available in memblock_add_range and the new MEMBLOCK_NOINIT flag is
set. I think its a bit more tricky to introduce a new function to set
the flag in the region AFTER the call to memblock_alloc_try_nid_raw has
finished as the memblock_region can not be found.
So something (hugepage_size/flag information) still has to be propagated
through function calls and a new argument needs to be added.
- (RFC vs flags patch 4 comparison) We can't skip initialization of the
whole region, only the tail pages. We still need to initialize the
HUGETLB_VMEMMAP_RESERVE_SIZE (PAGE_SIZE) struct pages for each gigantic
page.
In the RFC, hugepage_size from patch 2 was used in the for loop in
memmap_init_reserved_pages in patch 4 to reserve
HUGETLB_VMEMMAP_RESERVE_SIZE struct pages for every hugepage_size. This
looks very simple and not hacky.
If we use a flag, there are 2 ways to initialize the
HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage:
1. (implemented in github link patch 4) memmap_init_reserved_pages skips
the region for initialization as you suggested, and then we initialize
HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage somewhere later
(I did it in gather_bootmem_prealloc). When calling
reserve_bootmem_region in gather_bootmem_prealloc, we need to skip
early_page_uninitialised and this makes it look a bit hacky.
2. We initialize the HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per
hugepage in memmap_init_reserved_pages itself. As we have used a flag
and havent passed hugepage_size, we need to get the gigantic page size
somehow. There doesnt seem to be a nice way to determine the gigantic
page size in that function which is architecture dependent. I think
gigantic page size can be given by PAGE_SIZE << (PUD_SHIFT -
PAGE_SHIFT), but not sure if this is ok for all architectures? If we can
use PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT) it will look much better than
point 1.
Both the RFC patches and the github flags implementation work, but I
think RFC patches look much cleaner. If there is a strong preference for
the the github patches I can send it to mailing list?
Thanks,
Usama
>> phys_addr_t memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid);
>>
>> static __always_inline phys_addr_t memblock_phys_alloc(phys_addr_t size,
>> @@ -415,7 +417,7 @@ void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
>> int nid);
>> void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
>> phys_addr_t min_addr, phys_addr_t max_addr,
>> - int nid);
>> + int nid, phys_addr_t hugepage_size);
>> void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align,
>> phys_addr_t min_addr, phys_addr_t max_addr,
>> int nid);
>> @@ -431,7 +433,7 @@ static inline void *memblock_alloc_raw(phys_addr_t size,
>> {
>> return memblock_alloc_try_nid_raw(size, align, MEMBLOCK_LOW_LIMIT,
>> MEMBLOCK_ALLOC_ACCESSIBLE,
>> - NUMA_NO_NODE);
>> + NUMA_NO_NODE, 0);
>> }
>>
>> static inline void *memblock_alloc_from(phys_addr_t size,
>
Powered by blists - more mailing lists