lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157AE0C00C9EE805A78D8D7D4112@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Tue, 7 Jan 2025 00:39:07 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Jacob Pan <jacob.pan@...ux.microsoft.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "K. Y. Srinivasan" <kys@...rosoft.com>,
	Dexuan Cui <decui@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>, Wei
 Liu <wei.liu@...nel.org>
CC: Allen Pais <apais@...ux.microsoft.com>, Vikram Sethi <vsethi@...dia.com>,
	Michael Frohlich <mfrohlich@...rosoft.com>
Subject: RE: [PATCH] hv_balloon: Fallback to generic_online_page() for non-HV
 hot added mem

From: Jacob Pan <jacob.pan@...ux.microsoft.com> Sent: Thursday, January 2, 2025 1:40 PM
> 
> When device memory blocks are hot-added via add_memory_driver_managed(),
> the HV balloon driver intercepts them but fails to online these memory
> blocks. This could result in device driver initialization failures.
> 
> To address this, fall back to the generic online callback for memory
> blocks not added by the HV balloon driver. Similar cases are handled the
> same way in virtio-mem driver.

I had a little bit of trouble figuring out what problem this patch is solving.
Is this a correct description?

   The Hyper-V balloon driver installs a custom callback for handling page
   onlining operations performed by the memory hotplug subsystem. This
   custom callback is global, and overrides the default callback
   (generic_online_page) that Linux otherwise uses. The custom callback
   properly handles memory that is hot-added by the balloon driver as part
   of a Hyper-V hot-add region.

   But memory can also be hot-added directly by a device driver for a vPCI
   device, particularly GPUs. In such a case, the custom callback installed by
   the balloon driver runs, but won't find the page in its hot-add region list
   and doesn't online it, which could cause driver initialization failures.

   Fix this by having the balloon custom callback run generic_online_page()
   when the page isn't part of a Hyper-V hot-add region, thereby doing the
   default Linux behavior. This allows device driver hot-adds to work
   properly. Similar cases are handled the same way in the virtio-mem driver.

> 
> Suggested-by: Vikram Sethi <vsethi@...dia.com>
> Tested-by: Michael Frohlich <mfrohlich@...rosoft.com>
> Signed-off-by: Jacob Pan <jacob.pan@...ux.microsoft.com>
> ---
>  drivers/hv/hv_balloon.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index a99112e6f0b8..c999daf34108 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -766,16 +766,18 @@ static void hv_online_page(struct page *pg, unsigned int
> order)
>  	struct hv_hotadd_state *has;
>  	unsigned long pfn = page_to_pfn(pg);
> 
> -	guard(spinlock_irqsave)(&dm_device.ha_lock);
> -	list_for_each_entry(has, &dm_device.ha_region_list, list) {
> -		/* The page belongs to a different HAS. */
> -		if (pfn < has->start_pfn ||
> -		    (pfn + (1UL << order) > has->end_pfn))
> -			continue;
> +	scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
> +		list_for_each_entry(has, &dm_device.ha_region_list, list) {
> +			/* The page belongs to a different HAS. */
> +			if (pfn < has->start_pfn ||
> +				(pfn + (1UL << order) > has->end_pfn))
> +				continue;
> 
> -		hv_bring_pgs_online(has, pfn, 1UL << order);
> -		break;
> +			hv_bring_pgs_online(has, pfn, 1UL << order);
> +			return;
> +		}
>  	}
> +	generic_online_page(pg, order);
>  }
> 
>  static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
> --
> 2.34.1
> 

The code LGTM. I'd suggest updating the commit message along the
lines of what I wrote. My descriptions can be a bit wordy, but
hopefully they add clarity on the overall picture.

In any case,

Reviewed-by: Michael Kelley <mhklinux@...look.com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ