[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <05974e77-ae3c-4e62-a2c2-c764ab4a6d48@suse.com>
Date: Wed, 2 Apr 2025 16:23:02 +0200
From: Jürgen Groß <jgross@...e.com>
To: Roger Pau Monne <roger.pau@...rix.com>, xen-devel@...ts.xenproject.org,
 linux-kernel@...r.kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@...cle.com>,
 Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
 Stefano Stabellini <sstabellini@...nel.org>,
 Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>
Subject: Re: [PATCH] x86/xen: fix balloon target initialization for PVH dom0
On 02.04.25 13:36, Roger Pau Monne wrote:
> PVH dom0 re-uses logic from PV dom0, in which RAM ranges not assigned to
> dom0 are re-used as scratch memory to map foreign and grant pages.  Such
> logic relies on reporting those unpopulated ranges as RAM to Linux, and
> mark them as reserved.  This way Linux creates the underlying page
> structures required for metadata management.
> 
> Such approach works fine on PV because the initial balloon target is
> calculated using specific Xen data, that doesn't take into account the
> memory type changes described above.  However on HVM and PVH the initial
> balloon target is calculated using get_num_physpages(), and that function
> does take into account the unpopulated RAM regions used as scratch space
> for remote domain mappings.
> 
> This leads to PVH dom0 having an incorrect initial balloon target, which
> causes malfunction (excessive memory freeing) of the balloon driver if the
> dom0 memory target is later adjusted from the toolstack.
> 
> Fix this by using xen_released_pages to account for any pages that are part
> of the memory map, but are already unpopulated when the balloon driver is
> initialized.  This accounts for any regions used for scratch remote
> mappings.
> 
> Take the opportunity to unify PV with PVH/HVM guests regarding the usage of
> get_num_physpages(), as that avoids having to add different logic for PV vs
> PVH in both balloon_add_regions() and arch_xen_unpopulated_init().
> 
> Much like a6aa4eb994ee, the code in this changeset should have been part of
> 38620fc4e893.
> 
> Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if available')
> Signed-off-by: Roger Pau Monné <roger.pau@...rix.com>
> ---
> I think it's easier to unify the PV and PVH/HVM paths here regarding the
> usage of get_num_physpages(), as otherwise the fix needs to add further PV
> vs HVM divergences in both balloon_add_regions() and
> arch_xen_unpopulated_init(), but it also has a higher risk of breaking PV
> in subtle ways.
> ---
>   arch/x86/xen/enlighten.c |  7 +++++++
>   drivers/xen/balloon.c    | 19 +++++++++++--------
>   2 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 43dcd8c7badc..651bb206434c 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -466,6 +466,13 @@ int __init arch_xen_unpopulated_init(struct resource **res)
>   			xen_free_unpopulated_pages(1, &pg);
>   		}
>   
> +		/*
> +		 * Account for the region being in the physmap but unpopulated.
> +		 * The value in xen_released_pages is used by the balloon
> +		 * driver to know how much of the physmap is unpopulated and
> +		 * set an accurate initial memory target.
> +		 */
> +		xen_released_pages += xen_extra_mem[i].n_pfns;
>   		/* Zero so region is not also added to the balloon driver. */
>   		xen_extra_mem[i].n_pfns = 0;
>   	}
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 163f7f1d70f1..085d418ee6da 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -698,7 +698,15 @@ static void __init balloon_add_regions(void)
>   		for (pfn = start_pfn; pfn < extra_pfn_end; pfn++)
>   			balloon_append(pfn_to_page(pfn));
>   
> -		balloon_stats.total_pages += extra_pfn_end - start_pfn;
> +		/*
> +		 * Extra regions are accounted for in the physmap, but need
> +		 * decreasing from current_pages to balloon down the initial
> +		 * allocation, because they are already accounted for in
> +		 * total_pages.
> +		 */
> +		BUG_ON(extra_pfn_end - start_pfn >=
> +		       balloon_stats.current_pages);
Maybe instead of crashing the system disable ballooning and print some
diagnostics why this happened?
> +		balloon_stats.current_pages -= extra_pfn_end - start_pfn;
>   	}
>   }
>   
> @@ -711,13 +719,8 @@ static int __init balloon_init(void)
>   
>   	pr_info("Initialising balloon driver\n");
>   
> -#ifdef CONFIG_XEN_PV
> -	balloon_stats.current_pages = xen_pv_domain()
> -		? min(xen_start_info->nr_pages - xen_released_pages, max_pfn)
> -		: get_num_physpages();
> -#else
> -	balloon_stats.current_pages = get_num_physpages();
> -#endif
> +	BUG_ON(xen_released_pages >= get_num_physpages());
Again, I'd rather just disable ballooning instead of crashing the system.
> +	balloon_stats.current_pages = get_num_physpages() - xen_released_pages;
>   	balloon_stats.target_pages  = balloon_stats.current_pages;
>   	balloon_stats.balloon_low   = 0;
>   	balloon_stats.balloon_high  = 0;
Other than that I think your approach is fine.
Juergen
Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3684 bytes)
Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (496 bytes)
Powered by blists - more mailing lists
 
