[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB4157AE0C00C9EE805A78D8D7D4112@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Tue, 7 Jan 2025 00:39:07 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Jacob Pan <jacob.pan@...ux.microsoft.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-hyperv@...r.kernel.org"
<linux-hyperv@...r.kernel.org>, "K. Y. Srinivasan" <kys@...rosoft.com>,
Dexuan Cui <decui@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>, Wei
Liu <wei.liu@...nel.org>
CC: Allen Pais <apais@...ux.microsoft.com>, Vikram Sethi <vsethi@...dia.com>,
Michael Frohlich <mfrohlich@...rosoft.com>
Subject: RE: [PATCH] hv_balloon: Fallback to generic_online_page() for non-HV
hot added mem
From: Jacob Pan <jacob.pan@...ux.microsoft.com> Sent: Thursday, January 2, 2025 1:40 PM
>
> When device memory blocks are hot-added via add_memory_driver_managed(),
> the HV balloon driver intercepts them but fails to online these memory
> blocks. This could result in device driver initialization failures.
>
> To address this, fall back to the generic online callback for memory
> blocks not added by the HV balloon driver. Similar cases are handled the
> same way in virtio-mem driver.
I had a little bit of trouble figuring out what problem this patch is solving.
Is this a correct description?
The Hyper-V balloon driver installs a custom callback for handling page
onlining operations performed by the memory hotplug subsystem. This
custom callback is global, and overrides the default callback
(generic_online_page) that Linux otherwise uses. The custom callback
properly handles memory that is hot-added by the balloon driver as part
of a Hyper-V hot-add region.
But memory can also be hot-added directly by a device driver for a vPCI
device, particularly GPUs. In such a case, the custom callback installed by
the balloon driver runs, but won't find the page in its hot-add region list
and doesn't online it, which could cause driver initialization failures.
Fix this by having the balloon custom callback run generic_online_page()
when the page isn't part of a Hyper-V hot-add region, thereby doing the
default Linux behavior. This allows device driver hot-adds to work
properly. Similar cases are handled the same way in the virtio-mem driver.
>
> Suggested-by: Vikram Sethi <vsethi@...dia.com>
> Tested-by: Michael Frohlich <mfrohlich@...rosoft.com>
> Signed-off-by: Jacob Pan <jacob.pan@...ux.microsoft.com>
> ---
> drivers/hv/hv_balloon.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index a99112e6f0b8..c999daf34108 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -766,16 +766,18 @@ static void hv_online_page(struct page *pg, unsigned int
> order)
> struct hv_hotadd_state *has;
> unsigned long pfn = page_to_pfn(pg);
>
> - guard(spinlock_irqsave)(&dm_device.ha_lock);
> - list_for_each_entry(has, &dm_device.ha_region_list, list) {
> - /* The page belongs to a different HAS. */
> - if (pfn < has->start_pfn ||
> - (pfn + (1UL << order) > has->end_pfn))
> - continue;
> + scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
> + list_for_each_entry(has, &dm_device.ha_region_list, list) {
> + /* The page belongs to a different HAS. */
> + if (pfn < has->start_pfn ||
> + (pfn + (1UL << order) > has->end_pfn))
> + continue;
>
> - hv_bring_pgs_online(has, pfn, 1UL << order);
> - break;
> + hv_bring_pgs_online(has, pfn, 1UL << order);
> + return;
> + }
> }
> + generic_online_page(pg, order);
> }
>
> static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
> --
> 2.34.1
>
The code LGTM. I'd suggest updating the commit message along the
lines of what I wrote. My descriptions can be a bit wordy, but
hopefully they add clarity on the overall picture.
In any case,
Reviewed-by: Michael Kelley <mhklinux@...look.com>
Powered by blists - more mailing lists