linux-kernel - Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0628e2af-f7e7-056a-82ec-68860f9c4f29@oracle.com>
Date:   Wed, 22 Mar 2017 22:13:15 -0400
From:   Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:     Dan Streetman <dan.streetman@...onical.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
Cc:     David Vrabel <david.vrabel@...rix.com>,
        Juergen Gross <jgross@...e.com>,
        xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org
Subject: Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's
 initial state to number of existing RAM pages"



On 03/22/2017 05:16 PM, Dan Streetman wrote:
> I have a question about a problem introduced by this commit:
> c275a57f5ec3056f732843b11659d892235faff7
> "xen/balloon: Set balloon's initial state to number of existing RAM pages"
>
> It changed the xen balloon current_pages calculation to start with the
> number of physical pages in the system, instead of max_pfn.  Since
> get_num_physpages() does not include holes, it's always less than the
> e820 map's max_pfn.
>
> However, the problem that commit introduced is, if the hypervisor sets
> the balloon target to equal to the e820 map's max_pfn, then the
> balloon target will *always* be higher than the initial current pages.
> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
> the OS adds any holes, the balloon target will be higher than the
> current pages.  This is the situation, for example, for Amazon AWS
> instances.  The result is, the xen balloon will always immediately
> hotplug some memory at boot, but then make only (max_pfn -
> get_num_physpages()) available to the system.
>
> This balloon-hotplugged memory can cause problems, if the hypervisor
> wasn't expecting it; specifically, the system's physical page
> addresses now will exceed the e820 map's max_pfn, due to the
> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
> DMA to/from those physical pages above the e820 max_pfn, it causes
> problems.  For example:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>
> The additional small amount of balloon memory can cause other problems
> as well, for example:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>
> Anyway, I'd like to ask, was the original commit added because
> hypervisors are supposed to set their balloon target to the guest
> system's number of phys pages (max_pfn - holes)?  The mailing list
> discussion and commit description seem to indicate that.


IIRC the problem that this was trying to fix was that since max_pfn 
includes holes, upon booting we'd immediately balloon down by the 
(typically, MMIO) hole size.

If you boot a guest with ~4+GB memory you should see this.


> However I'm
> not sure how that is possible, because the kernel reserves its own
> holes, regardless of any predefined holes in the e820 map; for
> example, the kernel reserves 64k (by default) at phys addr 0 (the
> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
> the hypervisor really has no way to know what the "right" target to
> specify is; unless it knows the exact guest OS and kernel version, and
> kernel config values, it will never be able to correctly specify its
> target to be exactly (e820 max_pfn - all holes).
>
> Should this commit be reverted?  Should the xen balloon target be
> adjusted based on kernel-added e820 holes?

I think the second one but shouldn't current_pages be updated, and not 
the target? The latter is set by Xen (toolstack, via xenstore usually).

Also, the bugs above (at least one of them) talk about NVMe and I wonder 
whether the memory that they add is of RAM type --- I believe it has its 
own type and so perhaps that introduces additional inconsistencies. AWS 
may have added their own support for that, which we don't have upstream yet.

-boris


> Should something else be
> done?
>
> For context, Amazon Linux has simply disabled Xen ballooning
> completely.  Likewise, we're planning to disable Xen ballooning in the
> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS
> Ubuntu kernels).  However, if reverting this patch makes sense in a
> bigger context (i.e. Xen users besides AWS), that would allow more
> Ubuntu kernels to work correctly in AWS instances.
>