[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a03fcbb3-5b77-8671-6376-13c360f5ae25@redhat.com>
Date: Tue, 16 Mar 2021 18:45:17 +0100
From: David Hildenbrand <david@...hat.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>,
Anshuman Khandual <anshuman.khandual@....com>,
Vlastimil Babka <vbabka@...e.cz>,
Pavel Tatashin <pasha.tatashin@...een.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/5] mm,memory_hotplug: Allocate memmap from the added
memory range
On 16.03.21 17:46, David Hildenbrand wrote:
> On 15.03.21 11:22, Oscar Salvador wrote:
>> On Thu, Mar 11, 2021 at 08:06:53PM +0100, David Hildenbrand wrote:
>>> This looks essentially good to me, except some parts in
>>> mhp_supports_memmap_on_memory()
>>>
>>>> +bool mhp_supports_memmap_on_memory(unsigned long size)
>>>> +{
>>>> + unsigned long pageblock_size = PFN_PHYS(pageblock_nr_pages);
>>>> + unsigned long remaining_mem = size - PMD_SIZE;
>>
>> Hi David, thanks for the review!
>>
>>> This looks weird. I think what you want to test is that
>>>
>>>
>>> a) "nr_vmemmap_pages * sizeof(struct page)" spans complete PMDs (IOW, we
>>> won't map too much via the altmap when populating a PMD in the vmemmap)
>>>
>>> b) "remaining = size - nr_vmemmap_pages * sizeof(struct page)" spans
>>> complete pageblock.
>>
>> We do not know the nr_vmemmap_pages at this point in time, although it is
>> easy to pre-compute.
>>
>> For every section we populate, we use PMD_SIZE. So, PMD_SIZE/PAGE_SIZE lays
>> the nr_vmemmap_pages that are used for populating a single section.
>
> I find that cross reference to vmemmap code a little hard to digest.
> I would have assume that we don't have to care about PMDs in this
> code here at all. The vmemmap population code should handle that.
>
> I think I already mentioned that somewhere, I think it should be like this:
>
> a) vmemmap code should *never* populate more memory than requested for
> a single memory section when we are populating from the altmap.
> If that cannot be guaranteed for PMDs, then we have to fallback
> to populating base pages. Populating PMDs from an altmap with
> sizeof(struct page) == 64 is highly dangerous.
>
> Assume we have sizeof(struct page) == 56. A 128 MiB section
> spans 32768 pages - we need 32768 * sizeof(struct page)
> space for the vmemmap.
>
> With 64k pages we *can* use exactly one PMD. With 56k pages
> we need 448 individual (full!) pages for the vmemmap.
>
> IOW, we don't care how vmemmap code will do the mapping.
> vmemmap code has to get it right. IMHO, asserting it in
> this code is wrong.
>
>
> b) In this code, we really should only care about what
> memory onlining/offlining code can or can't do.
> We really only care that
>
> 1) size == memory_block_size_bytes()
> 2) remaining_size
> 3) IS_ALIGNED(remaining_size, pageblock_size);
>
>
> I think a) is a logical consequence of b) for x86-64,
> whereby the pageblock size corresponds to PMD, so we
> might not have to care about a) right now.
>
> See below for my take.
>
>
>>
>> But let me explain the reasoning I use in the current code:
>>
>> I will enumarate the assumptions that must hold true in order to support the
>> feature together with their check:
>>
>> - We span a single memory block
>>
>> size == memory_block_size_bytes()
>>
>> - The vmemmap pages span a complete PMD and no more than a PMD.
>>
>> !(PMD_SIZE % sizeof(struct page))
>>
>> - The vmemmap pages and the pages exposed to the buddy have to cover full
>> pageblocks
>>
>> remaining_mem = size - PMD_SIZE;
>> IS_ALIGNED(remaining_mem, pageblock_size)
>>
>> Although this check only covers the range without the vmemmap pages, one could
>> argue that since we use only a PMD_SIZE at a time, we know that PMD_SIZE is
>> pageblock aligned, so the vmemmap range is PMD_SIZE as well.
>>
>> Now, I see how this might be confusing and rather incomplete.
>> So I guess a better and more clear way to write it would be:
>>
>> bool mhp_supports_memmap_on_memory(unsigned long size)
>> {
>> unsigned long nr_vmemmap_pages = PMD_SIZE / PAGE_SIZE;
>> unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>> unsigned long remaining_size = size - vmemmap_size;
>>
>> return memmap_on_memory &&
>> IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
>> size == memory_block_size_bytes() &&
>> !(PMD_SIZE % vmemmap_size) &&
>> IS_ALIGNED(vmemmap_size, pageblock_size) &&
>> remaining_size &&
>> IS_ALIGNED(remaining_size, pageblock_size);
>> }
>>
>> Note that above check is only for a single section, but if assumptions hold true
>> for a single section, it will for many as well.
>
> Okay, please document the statement about single sections, that's
> important to understand what's happening.
>
> My take would be
>
> bool mhp_supports_memmap_on_memory(unsigned long size)
> {
> /*
> * Note: We calculate for a single memory section. The calculation
> * implicitly covers memory blocks that span multiple sections.
> */
> unsigned long nr_vmemmap_pages = SECTION_SIZE / PAGE_SIZE;
> unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> unsigned long remaining_size = size - vmemmap_size;
>
> /*
> * Note: vmemmap code has to make sure to not populate more memory
> * via the altmap than absolutely necessary for a single section.
> * This always holds when we allocate pageblock-sized chunks from
> * the altmap, as we require pageblock alignment here.
> *
> * TODO, other doc
> */
> return memmap_on_memory &&
> IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
> size == memory_block_size_bytes() &&
> remaining_size &&
> IS_ALIGNED(remaining_size, pageblock_size);
Oh, I forgot, we can also drop the check for "remaining_size" in that
code. If we ever have PAGE_SIZE == sizeof(struct page) we'll be in
bigger trouble :D
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists