[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0101017479c7063b-aec9b77c-4f0f-41b6-84e6-f91a5ed1f3f6-000000@us-west-2.amazonses.com>
Date: Thu, 10 Sep 2020 20:48:40 +0000
From: sudaraja@...eaurora.org
To: Anshuman Khandual <anshuman.khandual@....com>
Cc: Steven Price <steven.price@....com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Logan Gunthorpe <logang@...tatee.com>,
David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
pratikp@...eaurora.org
Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually
contiguous memory
On 2020-09-10 03:50, Anshuman Khandual wrote:
> On 09/10/2020 01:57 PM, Steven Price wrote:
>> On 10/09/2020 07:05, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from
>>> physically
>>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf().
>>> Section
>>> mappings are good to reduce TLB pressure. But when system is highly
>>> fragmented
>>> and memory blocks are being hot-added at runtime, its possible that
>>> such
>>> physically continuous memory allocations can fail. Rather than
>>> failing the
>>> memory hot-add procedure, add a fallback option to allocate vmemmap
>>> pages from
>>> discontinuous pages using vmemmap_populate_basepages().
>>>
>>> Signed-off-by: Sudarshan Rajagopalan <sudaraja@...eaurora.org>
>>> Cc: Catalin Marinas <catalin.marinas@....com>
>>> Cc: Will Deacon <will@...nel.org>
>>> Cc: Anshuman Khandual <anshuman.khandual@....com>
>>> Cc: Mark Rutland <mark.rutland@....com>
>>> Cc: Logan Gunthorpe <logang@...tatee.com>
>>> Cc: David Hildenbrand <david@...hat.com>
>>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>>> Cc: Steven Price <steven.price@....com>
>>> ---
>>> arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>> 1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 75df62f..a46c7d4 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long
>>> start, unsigned long end, int node,
>>> p4d_t *p4dp;
>>> pud_t *pudp;
>>> pmd_t *pmdp;
>>> + int ret = 0;
>>> do {
>>> next = pmd_addr_end(addr, end);
>>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long
>>> start, unsigned long end, int node,
>>> void *p = NULL;
>>> p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>>> - if (!p)
>>> - return -ENOMEM;
>>> + if (!p) {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>> + vmemmap_free(start, end, altmap);
>>> +#endif
>>> + ret = -ENOMEM;
>>> + break;
>>> + }
>>> pmd_set_huge(pmdp, __pa(p),
>>> __pgprot(PROT_SECT_NORMAL));
>>> } else
>>> vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>> } while (addr = next, addr != end);
>>> - return 0;
>>> + if (ret)
>>> + return vmemmap_populate_basepages(start, end, node, altmap);
>>> + else
>>> + return ret;
>>
>> Style comment: I find this usage of 'ret' confusing. When we assign
>> -ENOMEM above that is never actually the return value of the function
>> (in that case vmemmap_populate_basepages() provides the actual return
>> value).
>
> Right.
>
>>
>> Also the "return ret" is misleading since we know by that point that
>> ret==0 (and the 'else' is redundant).
>
> Right.
>
>>
>> Can you not just move the call to vmemmap_populate_basepages() up to
>> just after the (possible) vmemmap_free() call and remove the 'ret'
>> variable?
>>
Yes the usage of "return ret" is quite confusing and misleading here -
will clean this.
>> AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the
>> function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel
>> you
>
> Right, CONFIG_MEMORY_HOTPLUG is not required.
Not quite exactly - the vmemmap_free() declaration in include/linux/mm.h
header file is wrapped around CONFIG_MEMORY_HOTPLUG as well. And since
the function definition is below the place where this is called, it will
throw an implicit declaration compile error when CONFIG_MEMORY_HOTPLUG
is not enabled. We can move the function definition above so that we
don't have to place this #ifdef. But we can go with 1st approach that
Anshuman mentions below.
>
> need at least a comment to explain Anshuman's point that it looks like
> you're freeing an unmapped area. Although if I'm reading the code
> correctly it seems like the unmapped area will just be skipped.
> Proposed vmemmap_free() attempts to free the entire requested vmemmap
> range
> [start, end] when an intermediate PMD entry can not be allocated. Hence
> even
> if vmemap_free() could skip an unmapped area (will double check on
> that), it
> unnecessarily goes through large sections of unmapped range, which
> could not
> have been mapped.
>
> So, basically there could be two different methods for doing this
> fallback.
>
> 1. Call vmemmap_populate_basepages() for sections when PMD_SIZE
> allocation fails
>
> - vmemmap_free() need not be called
>
> 2. Abort at the first instance of PMD_SIZE allocation failure
>
> - Call vmemmap_free() to unmap all sections mapped till that point
> - Call vmemmap_populate_basepages() to map the entire request section
>
> The proposed patch tried to mix both approaches. Regardless, the first
> approach
> here seems better and is the case in vmemmap_populate_hugepages()
> implementation
> on x86 as well.
The 1st approach looks more cleaner compared to bailing out in first
failure, unmapping all previously mapped sections and map entire request
with vmemmap_populate_basepages. Thanks for the review and suggestion -
will send over a cleaner patch soon.
Sudarshan
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project
Powered by blists - more mailing lists