[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d9583e1-3374-437d-8eea-6ab1e1400a30@redhat.com>
Date: Mon, 10 Jun 2024 10:56:02 +0200
From: David Hildenbrand <david@...hat.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-hyperv@...r.kernel.org, virtualization@...ts.linux.dev,
xen-devel@...ts.xenproject.org, kasan-dev@...glegroups.com,
Andrew Morton <akpm@...ux-foundation.org>, Mike Rapoport <rppt@...nel.org>,
"K. Y. Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>, "Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
Eugenio PĂ©rez <eperezma@...hat.com>,
Juergen Gross <jgross@...e.com>, Stefano Stabellini
<sstabellini@...nel.org>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
Alexander Potapenko <glider@...gle.com>, Marco Elver <elver@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>
Subject: Re: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of
!ZONE_DEVICE with PageOffline() instead of PageReserved()
On 10.06.24 06:23, Oscar Salvador wrote:
> On Fri, Jun 07, 2024 at 11:09:37AM +0200, David Hildenbrand wrote:
>> We currently initialize the memmap such that PG_reserved is set and the
>> refcount of the page is 1. In virtio-mem code, we have to manually clear
>> that PG_reserved flag to make memory offlining with partially hotplugged
>> memory blocks possible: has_unmovable_pages() would otherwise bail out on
>> such pages.
>>
>> We want to avoid PG_reserved where possible and move to typed pages
>> instead. Further, we want to further enlighten memory offlining code about
>> PG_offline: offline pages in an online memory section. One example is
>> handling managed page count adjustments in a cleaner way during memory
>> offlining.
>>
>> So let's initialize the pages with PG_offline instead of PG_reserved.
>> generic_online_page()->__free_pages_core() will now clear that flag before
>> handing that memory to the buddy.
>>
>> Note that the page refcount is still 1 and would forbid offlining of such
>> memory except when special care is take during GOING_OFFLINE as
>> currently only implemented by virtio-mem.
>>
>> With this change, we can now get non-PageReserved() pages in the XEN
>> balloon list. From what I can tell, that can already happen via
>> decrease_reservation(), so that should be fine.
>>
>> HV-balloon should not really observe a change: partial online memory
>> blocks still cannot get surprise-offlined, because the refcount of these
>> PageOffline() pages is 1.
>>
>> Update virtio-mem, HV-balloon and XEN-balloon code to be aware that
>> hotplugged pages are now PageOffline() instead of PageReserved() before
>> they are handed over to the buddy.
>>
>> We'll leave the ZONE_DEVICE case alone for now.
>>
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 27e3be75edcf7..0254059efcbe1 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -734,7 +734,7 @@ static inline void section_taint_zone_device(unsigned long pfn)
>> /*
>> * Associate the pfn range with the given zone, initializing the memmaps
>> * and resizing the pgdat/zone data to span the added pages. After this
>> - * call, all affected pages are PG_reserved.
>> + * call, all affected pages are PageOffline().
>> *
>> * All aligned pageblocks are initialized to the specified migratetype
>> * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
>> @@ -1100,8 +1100,12 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
>>
>> move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
>>
>> - for (i = 0; i < nr_pages; i++)
>> - SetPageVmemmapSelfHosted(pfn_to_page(pfn + i));
>> + for (i = 0; i < nr_pages; i++) {
>> + struct page *page = pfn_to_page(pfn + i);
>> +
>> + __ClearPageOffline(page);
>> + SetPageVmemmapSelfHosted(page);
>
> So, refresh my memory here please.
> AFAIR, those VmemmapSelfHosted pages were marked Reserved before, but now,
> memmap_init_range() will not mark them reserved anymore.
Correct.
> I do not think that is ok? I am worried about walkers getting this wrong.
>
> We usually skip PageReserved pages in walkers because are pages we cannot deal
> with for those purposes, but with this change, we will leak
> PageVmemmapSelfHosted, and I am not sure whether are ready for that.
There are fortunately not that many left.
I'd even say marking them (vmemmap) reserved is more wrong than right:
note that ordinary vmemmap pages after memory hotplug are not reserved!
Only bootmem should be reserved.
Let's take at the relevant core-mm ones (arch stuff is mostly just for
MMIO remapping)
fs/proc/task_mmu.c: if (PageReserved(page))
fs/proc/task_mmu.c: if (PageReserved(page))
-> If we find vmemmap pages mapped into user space we already messed up
seriously
kernel/power/snapshot.c: if (PageReserved(page) ||
kernel/power/snapshot.c: if (PageReserved(page)
-> There should be no change (saveable_page() would still allow saving
them, highmem does not apply)
mm/hugetlb_vmemmap.c: if (!PageReserved(head))
mm/hugetlb_vmemmap.c: if (PageReserved(page))
-> Wants to identify bootmem, but we exclude these
PageVmemmapSelfHosted() on the splitting part already properly
mm/page_alloc.c: VM_WARN_ON_ONCE(PageReserved(p));
mm/page_alloc.c: if (PageReserved(page))
-> pfn_range_valid_contig() would scan them, just like for ordinary
vmemmap pages during hotplug. We'll simply fail isolating/migrating
them similarly (like any unmovable allocations) later
mm/page_ext.c: BUG_ON(PageReserved(page));
-> free_page_ext handling, does not apply
mm/page_isolation.c: if (PageReserved(page))
-> has_unmovable_pages() should still detect them as unmovable (e.g.,
neither movable nor LRU).
mm/page_owner.c: if (PageReserved(page))
mm/page_owner.c: if (PageReserved(page))
-> Simply page_ext_get() will return NULL instead and we'll similarly
skip them
mm/sparse.c: if (!PageReserved(virt_to_page(ms->usage))) {
-> Detecting boot memory for ms->usage allocation, does not apply to
vmemmap.
virt/kvm/kvm_main.c: if (!PageReserved(page))
virt/kvm/kvm_main.c: return !PageReserved(page);
-> For MMIO remapping purposes, does not apply to vmemmap
> Moreover, boot memmap pages are marked as PageReserved, which would be
> now inconsistent with those added during hotplug operations.
Just like vmemmap pages allocated dynamically during memory hotplug.
Now, really only bootmem-ones are PageReserved.
> All in all, I feel uneasy about this change.
I really don't want to mark these pages here PageReserved for the sake
of it.
Any PageReserved user that I am missing, or why we should handle these
vmemmap pages differently than the ones allocated during ordinary memory
hotplug?
In the future, we might want to consider using a dedicated page type for
them, so we can stop using a bit that doesn't allow to reliably identify
them. (we should mark all vmemmap with that type then)
Thanks!
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists