[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5be95091-b4ac-8e05-4694-ac5c65f790a4@redhat.com>
Date: Fri, 26 Mar 2021 09:52:58 +0100
From: David Hildenbrand <david@...hat.com>
To: Michal Hocko <mhocko@...e.com>, Oscar Salvador <osalvador@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Anshuman Khandual <anshuman.khandual@....com>,
Vlastimil Babka <vbabka@...e.cz>,
Pavel Tatashin <pasha.tatashin@...een.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 1/5] mm,memory_hotplug: Allocate memmap from the added
memory range
On 26.03.21 09:35, Michal Hocko wrote:
> On Thu 25-03-21 23:06:50, Oscar Salvador wrote:
>> On Thu, Mar 25, 2021 at 05:47:30PM +0100, Michal Hocko wrote:
>>> On Thu 25-03-21 17:36:22, Michal Hocko wrote:
>>>> If all it takes is to make pfn_to_online_page work (and my
>>>> previous attempt is incorrect because it should consult block rather
>>>> than section pfn range)
>>>
>>> This should work.
>>
>> Sorry, but while this solves some of the issues with that approach, I really
>> think that overcomplicates things and buys us not so much in return.
>> To me it seems that we are just patching things to make it work that
>> way.
>
> I do agree that special casing vmemmap areas is something I do not
> really like but we are in that schrödinger situation when this memory is
> not onlineable unless it shares memory section with an onlineable
> memory. There are two ways around that, either special case it on
> pfn_to_online_page or mark the vmemmap section online even though it is
> not really.
>
>> To be honest, I dislike this, and I guess we can only agree to disagree
>> here.
>
> No problem there. I will not insist on my approach unless I can convince
> you that it is a better solution. It seems I have failed and I can live
> with that.
>
>> I find the following much easier, cleaner, and less risky to encounter
>> pitfalls in the future:
>>
>> (!!!It is untested and incomplete, and I would be surprised if it even
>> compiles, but it is enough as a PoC !!!)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 5ea2b3fbce02..799d14fc2f9b 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -169,6 +169,60 @@ int memory_notify(unsigned long val, void *v)
>> return blocking_notifier_call_chain(&memory_chain, val, v);
>> }
>>
>> +static int memory_block_online(unsigned long start_pfn, unsigned long nr_pages,
>> + unsigned long nr_vmemmap_pages, int online_type,
>> + int nid)
>> +{
>> + int ret;
>> + /*
>> + * Despite vmemmap pages having a different lifecycle than the pages
>> + * they describe, initialiating and accounting vmemmap pages at the
>> + * online/offline stage eases things a lot.
>
> This requires quite some explaining.
>
>> + * We do it out of {online,offline}_pages, so those routines only have
>> + * to deal with pages that are actual usable memory.
>> + */
>> + if (nr_vmemmap_pages) {
>> + struct zone *z;
>> +
>> + z = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
>> + move_pfn_range_to_zone(z, start_pfn, nr_vmemmap_pages, NULL,
>> + MIGRATE_UNMOVABLE);
>> + /*
>> + * The below could go to a helper to make it less bulky here,
>> + * so {online,offline}_pages could also use it.
>> + */
>> + z->present_pages += nr_pages;
>> + pgdat_resize_lock(z->zone_pgdat, &flags);
>> + z->zone_pgdat->node_present_pages += nr_pages;
>> + pgdat_resize_unlock(z->zone_pgdat, &flags);
Might have to set fully spanned section online. (vmemmap >= SECTION_SIZE)
>> + }
>> +
>> + ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages,
>> + online_type);
>> +
>> + /*
>> + * In case online_pages() failed for some reason, we should cleanup vmemmap
>> + * accounting as well.
>> + */
>> + return ret;
>> +}
>
> Yes this is much better! Just a minor suggestion would be to push
> memory_block all the way to memory_block_online (it oline a memory
> block). I would also slightly prefer to provide 2 helpers that would make
> it clear that this is to reserve/cleanup the vmemamp space (defined in
> the memory_hotplug proper).
>
> Thanks!
>
Something else to note:
We'll not call the memory notifier (e.g., MEM_ONLINE) for the vmemmap.
The result is that
1. We won't allocate extended struct pages for the range. Don't think
this is really problematic (pages are never allocated/freed, so I guess
we don't care - like ZONE_DEVICE code).
2. We won't allocate kasan shadow memory. We most probably have to do it
explicitly via kasan_add_zero_shadow()/kasan_remove_zero_shadow(), see
mm/memremap.c:pagemap_range()
Further a locking rework might be necessary. We hold the device hotplug
lock, but not the memory hotplug lock. E.g., for get_online_mems().
Might have to move that out online_pages.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists