lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0101017479c7063b-aec9b77c-4f0f-41b6-84e6-f91a5ed1f3f6-000000@us-west-2.amazonses.com>
Date:   Thu, 10 Sep 2020 20:48:40 +0000
From:   sudaraja@...eaurora.org
To:     Anshuman Khandual <anshuman.khandual@....com>
Cc:     Steven Price <steven.price@....com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Logan Gunthorpe <logang@...tatee.com>,
        David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        pratikp@...eaurora.org
Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually
 contiguous memory

On 2020-09-10 03:50, Anshuman Khandual wrote:
> On 09/10/2020 01:57 PM, Steven Price wrote:
>> On 10/09/2020 07:05, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from 
>>> physically
>>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). 
>>> Section
>>> mappings are good to reduce TLB pressure. But when system is highly 
>>> fragmented
>>> and memory blocks are being hot-added at runtime, its possible that 
>>> such
>>> physically continuous memory allocations can fail. Rather than 
>>> failing the
>>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>>> pages from
>>> discontinuous pages using vmemmap_populate_basepages().
>>> 
>>> Signed-off-by: Sudarshan Rajagopalan <sudaraja@...eaurora.org>
>>> Cc: Catalin Marinas <catalin.marinas@....com>
>>> Cc: Will Deacon <will@...nel.org>
>>> Cc: Anshuman Khandual <anshuman.khandual@....com>
>>> Cc: Mark Rutland <mark.rutland@....com>
>>> Cc: Logan Gunthorpe <logang@...tatee.com>
>>> Cc: David Hildenbrand <david@...hat.com>
>>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>>> Cc: Steven Price <steven.price@....com>
>>> ---
>>>   arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 75df62f..a46c7d4 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long 
>>> start, unsigned long end, int node,
>>>       p4d_t *p4dp;
>>>       pud_t *pudp;
>>>       pmd_t *pmdp;
>>> +    int ret = 0;
>>>         do {
>>>           next = pmd_addr_end(addr, end);
>>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long 
>>> start, unsigned long end, int node,
>>>               void *p = NULL;
>>>                 p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>>> -            if (!p)
>>> -                return -ENOMEM;
>>> +            if (!p) {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>> +                vmemmap_free(start, end, altmap);
>>> +#endif
>>> +                ret = -ENOMEM;
>>> +                break;
>>> +            }
>>>                 pmd_set_huge(pmdp, __pa(p), 
>>> __pgprot(PROT_SECT_NORMAL));
>>>           } else
>>>               vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>>       } while (addr = next, addr != end);
>>>   -    return 0;
>>> +    if (ret)
>>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>>> +    else
>>> +        return ret;
>> 
>> Style comment: I find this usage of 'ret' confusing. When we assign 
>> -ENOMEM above that is never actually the return value of the function 
>> (in that case vmemmap_populate_basepages() provides the actual return 
>> value).
> 
> Right.
> 
>> 
>> Also the "return ret" is misleading since we know by that point that 
>> ret==0 (and the 'else' is redundant).
> 
> Right.
> 
>> 
>> Can you not just move the call to vmemmap_populate_basepages() up to 
>> just after the (possible) vmemmap_free() call and remove the 'ret' 
>> variable?
>> 

Yes the usage of "return ret" is quite confusing and misleading here - 
will clean this.

>> AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the 
>> function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel 
>> you
> 
> Right, CONFIG_MEMORY_HOTPLUG is not required.

Not quite exactly - the vmemmap_free() declaration in include/linux/mm.h 
header file is wrapped around CONFIG_MEMORY_HOTPLUG as well. And since 
the function definition is below the place where this is called, it will 
throw an implicit declaration compile error when CONFIG_MEMORY_HOTPLUG 
is not enabled. We can move the function definition above so that we 
don't have to place this #ifdef. But we can go with 1st approach that 
Anshuman mentions below.

> 
> need at least a comment to explain Anshuman's point that it looks like
> you're freeing an unmapped area. Although if I'm reading the code
> correctly it seems like the unmapped area will just be skipped.
> Proposed vmemmap_free() attempts to free the entire requested vmemmap 
> range
> [start, end] when an intermediate PMD entry can not be allocated. Hence 
> even
> if vmemap_free() could skip an unmapped area (will double check on 
> that), it
> unnecessarily goes through large sections of unmapped range, which 
> could not
> have been mapped.
> 
> So, basically there could be two different methods for doing this 
> fallback.
> 
> 1. Call vmemmap_populate_basepages() for sections when PMD_SIZE 
> allocation fails
> 
> 	- vmemmap_free() need not be called
> 
> 2. Abort at the first instance of PMD_SIZE allocation failure
> 
> 	- Call vmemmap_free() to unmap all sections mapped till that point
> 	- Call vmemmap_populate_basepages() to map the entire request section
> 
> The proposed patch tried to mix both approaches. Regardless, the first 
> approach
> here seems better and is the case in vmemmap_populate_hugepages() 
> implementation
> on x86 as well.

The 1st approach looks more cleaner compared to bailing out in first 
failure, unmapping all previously mapped sections and map entire request 
with vmemmap_populate_basepages. Thanks for the review and suggestion - 
will send over a cleaner patch soon.

Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ