[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <05974e77-ae3c-4e62-a2c2-c764ab4a6d48@suse.com>
Date: Wed, 2 Apr 2025 16:23:02 +0200
From: Jürgen Groß <jgross@...e.com>
To: Roger Pau Monne <roger.pau@...rix.com>, xen-devel@...ts.xenproject.org,
linux-kernel@...r.kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
Stefano Stabellini <sstabellini@...nel.org>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>
Subject: Re: [PATCH] x86/xen: fix balloon target initialization for PVH dom0
On 02.04.25 13:36, Roger Pau Monne wrote:
> PVH dom0 re-uses logic from PV dom0, in which RAM ranges not assigned to
> dom0 are re-used as scratch memory to map foreign and grant pages. Such
> logic relies on reporting those unpopulated ranges as RAM to Linux, and
> mark them as reserved. This way Linux creates the underlying page
> structures required for metadata management.
>
> Such approach works fine on PV because the initial balloon target is
> calculated using specific Xen data, that doesn't take into account the
> memory type changes described above. However on HVM and PVH the initial
> balloon target is calculated using get_num_physpages(), and that function
> does take into account the unpopulated RAM regions used as scratch space
> for remote domain mappings.
>
> This leads to PVH dom0 having an incorrect initial balloon target, which
> causes malfunction (excessive memory freeing) of the balloon driver if the
> dom0 memory target is later adjusted from the toolstack.
>
> Fix this by using xen_released_pages to account for any pages that are part
> of the memory map, but are already unpopulated when the balloon driver is
> initialized. This accounts for any regions used for scratch remote
> mappings.
>
> Take the opportunity to unify PV with PVH/HVM guests regarding the usage of
> get_num_physpages(), as that avoids having to add different logic for PV vs
> PVH in both balloon_add_regions() and arch_xen_unpopulated_init().
>
> Much like a6aa4eb994ee, the code in this changeset should have been part of
> 38620fc4e893.
>
> Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if available')
> Signed-off-by: Roger Pau Monné <roger.pau@...rix.com>
> ---
> I think it's easier to unify the PV and PVH/HVM paths here regarding the
> usage of get_num_physpages(), as otherwise the fix needs to add further PV
> vs HVM divergences in both balloon_add_regions() and
> arch_xen_unpopulated_init(), but it also has a higher risk of breaking PV
> in subtle ways.
> ---
> arch/x86/xen/enlighten.c | 7 +++++++
> drivers/xen/balloon.c | 19 +++++++++++--------
> 2 files changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 43dcd8c7badc..651bb206434c 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -466,6 +466,13 @@ int __init arch_xen_unpopulated_init(struct resource **res)
> xen_free_unpopulated_pages(1, &pg);
> }
>
> + /*
> + * Account for the region being in the physmap but unpopulated.
> + * The value in xen_released_pages is used by the balloon
> + * driver to know how much of the physmap is unpopulated and
> + * set an accurate initial memory target.
> + */
> + xen_released_pages += xen_extra_mem[i].n_pfns;
> /* Zero so region is not also added to the balloon driver. */
> xen_extra_mem[i].n_pfns = 0;
> }
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 163f7f1d70f1..085d418ee6da 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -698,7 +698,15 @@ static void __init balloon_add_regions(void)
> for (pfn = start_pfn; pfn < extra_pfn_end; pfn++)
> balloon_append(pfn_to_page(pfn));
>
> - balloon_stats.total_pages += extra_pfn_end - start_pfn;
> + /*
> + * Extra regions are accounted for in the physmap, but need
> + * decreasing from current_pages to balloon down the initial
> + * allocation, because they are already accounted for in
> + * total_pages.
> + */
> + BUG_ON(extra_pfn_end - start_pfn >=
> + balloon_stats.current_pages);
Maybe instead of crashing the system disable ballooning and print some
diagnostics why this happened?
> + balloon_stats.current_pages -= extra_pfn_end - start_pfn;
> }
> }
>
> @@ -711,13 +719,8 @@ static int __init balloon_init(void)
>
> pr_info("Initialising balloon driver\n");
>
> -#ifdef CONFIG_XEN_PV
> - balloon_stats.current_pages = xen_pv_domain()
> - ? min(xen_start_info->nr_pages - xen_released_pages, max_pfn)
> - : get_num_physpages();
> -#else
> - balloon_stats.current_pages = get_num_physpages();
> -#endif
> + BUG_ON(xen_released_pages >= get_num_physpages());
Again, I'd rather just disable ballooning instead of crashing the system.
> + balloon_stats.current_pages = get_num_physpages() - xen_released_pages;
> balloon_stats.target_pages = balloon_stats.current_pages;
> balloon_stats.balloon_low = 0;
> balloon_stats.balloon_high = 0;
Other than that I think your approach is fine.
Juergen
Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3684 bytes)
Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (496 bytes)
Powered by blists - more mailing lists