[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250107092750.255db61d@DESKTOP-0403QTC.>
Date: Tue, 7 Jan 2025 09:27:50 -0800
From: Jacob Pan <jacob.pan@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>, "K. Y.
Srinivasan" <kys@...rosoft.com>, Dexuan Cui <decui@...rosoft.com>, Haiyang
Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>, Allen Pais
<apais@...ux.microsoft.com>, Vikram Sethi <vsethi@...dia.com>, Michael
Frohlich <mfrohlich@...rosoft.com>, jacob.pan@...ux.microsoft.com
Subject: Re: [PATCH] hv_balloon: Fallback to generic_online_page() for
non-HV hot added mem
Hi Michael,
On Tue, 7 Jan 2025 00:39:07 +0000
Michael Kelley <mhklinux@...look.com> wrote:
> From: Jacob Pan <jacob.pan@...ux.microsoft.com> Sent: Thursday,
> January 2, 2025 1:40 PM
> >
> > When device memory blocks are hot-added via
> > add_memory_driver_managed(), the HV balloon driver intercepts them
> > but fails to online these memory blocks. This could result in
> > device driver initialization failures.
> >
> > To address this, fall back to the generic online callback for memory
> > blocks not added by the HV balloon driver. Similar cases are
> > handled the same way in virtio-mem driver.
>
> I had a little bit of trouble figuring out what problem this patch is
> solving. Is this a correct description?
>
> The Hyper-V balloon driver installs a custom callback for handling
> page onlining operations performed by the memory hotplug subsystem.
> This custom callback is global, and overrides the default callback
> (generic_online_page) that Linux otherwise uses. The custom
> callback properly handles memory that is hot-added by the balloon
> driver as part of a Hyper-V hot-add region.
>
> But memory can also be hot-added directly by a device driver for a
> vPCI device, particularly GPUs. In such a case, the custom callback
> installed by the balloon driver runs, but won't find the page in its
> hot-add region list and doesn't online it, which could cause driver
> initialization failures.
>
> Fix this by having the balloon custom callback run
> generic_online_page() when the page isn't part of a Hyper-V hot-add
> region, thereby doing the default Linux behavior. This allows device
> driver hot-adds to work properly. Similar cases are handled the same
> way in the virtio-mem driver.
>
Yes, your description has the complete story. I was also thinking the
generic code could handle this if the custom callback is not a void
return type. Then we don't have to duplicate the code in here and
virtio-mem, perhaps a separate cleanup effort.
> >
> > Suggested-by: Vikram Sethi <vsethi@...dia.com>
> > Tested-by: Michael Frohlich <mfrohlich@...rosoft.com>
> > Signed-off-by: Jacob Pan <jacob.pan@...ux.microsoft.com>
> > ---
> > drivers/hv/hv_balloon.c | 18 ++++++++++--------
> > 1 file changed, 10 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> > index a99112e6f0b8..c999daf34108 100644
> > --- a/drivers/hv/hv_balloon.c
> > +++ b/drivers/hv/hv_balloon.c
> > @@ -766,16 +766,18 @@ static void hv_online_page(struct page *pg,
> > unsigned int order)
> > struct hv_hotadd_state *has;
> > unsigned long pfn = page_to_pfn(pg);
> >
> > - guard(spinlock_irqsave)(&dm_device.ha_lock);
> > - list_for_each_entry(has, &dm_device.ha_region_list, list) {
> > - /* The page belongs to a different HAS. */
> > - if (pfn < has->start_pfn ||
> > - (pfn + (1UL << order) > has->end_pfn))
> > - continue;
> > + scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
> > + list_for_each_entry(has,
> > &dm_device.ha_region_list, list) {
> > + /* The page belongs to a different HAS. */
> > + if (pfn < has->start_pfn ||
> > + (pfn + (1UL << order) >
> > has->end_pfn))
> > + continue;
> >
> > - hv_bring_pgs_online(has, pfn, 1UL << order);
> > - break;
> > + hv_bring_pgs_online(has, pfn, 1UL <<
> > order);
> > + return;
> > + }
> > }
> > + generic_online_page(pg, order);
> > }
> >
> > static int pfn_covered(unsigned long start_pfn, unsigned long
> > pfn_cnt) --
> > 2.34.1
> >
>
> The code LGTM. I'd suggest updating the commit message along the
> lines of what I wrote. My descriptions can be a bit wordy, but
> hopefully they add clarity on the overall picture.
>
> In any case,
>
> Reviewed-by: Michael Kelley <mhklinux@...look.com>
>
Will update the commit message in the next version, thanks a lot.
Thanks,
Jacob
Powered by blists - more mailing lists