linux-kernel - Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1bf56d75-4ffb-ba41-4c96-76c120c7800c@suse.com>
Date:   Thu, 23 Mar 2017 08:56:20 +0100
From:   Juergen Gross <jgross@...e.com>
To:     Dan Streetman <dan.streetman@...onical.com>
Cc:     Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org
Subject: Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's
 initial state to number of existing RAM pages"

On 23/03/17 03:13, Boris Ostrovsky wrote:
> 
> 
> On 03/22/2017 05:16 PM, Dan Streetman wrote:
>> I have a question about a problem introduced by this commit:
>> c275a57f5ec3056f732843b11659d892235faff7
>> "xen/balloon: Set balloon's initial state to number of existing RAM
>> pages"
>>
>> It changed the xen balloon current_pages calculation to start with the
>> number of physical pages in the system, instead of max_pfn.  Since
>> get_num_physpages() does not include holes, it's always less than the
>> e820 map's max_pfn.
>>
>> However, the problem that commit introduced is, if the hypervisor sets
>> the balloon target to equal to the e820 map's max_pfn, then the
>> balloon target will *always* be higher than the initial current pages.
>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if
>> the OS adds any holes, the balloon target will be higher than the
>> current pages.  This is the situation, for example, for Amazon AWS
>> instances.  The result is, the xen balloon will always immediately
>> hotplug some memory at boot, but then make only (max_pfn -
>> get_num_physpages()) available to the system.
>>
>> This balloon-hotplugged memory can cause problems, if the hypervisor
>> wasn't expecting it; specifically, the system's physical page
>> addresses now will exceed the e820 map's max_pfn, due to the
>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device
>> DMA to/from those physical pages above the e820 max_pfn, it causes
>> problems.  For example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129
>>
>> The additional small amount of balloon memory can cause other problems
>> as well, for example:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
>>
>> Anyway, I'd like to ask, was the original commit added because
>> hypervisors are supposed to set their balloon target to the guest
>> system's number of phys pages (max_pfn - holes)?  The mailing list
>> discussion and commit description seem to indicate that.
> 
> 
> IIRC the problem that this was trying to fix was that since max_pfn
> includes holes, upon booting we'd immediately balloon down by the
> (typically, MMIO) hole size.
> 
> If you boot a guest with ~4+GB memory you should see this.
> 
> 
>> However I'm
>> not sure how that is possible, because the kernel reserves its own
>> holes, regardless of any predefined holes in the e820 map; for
>> example, the kernel reserves 64k (by default) at phys addr 0 (the
>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW).  So
>> the hypervisor really has no way to know what the "right" target to
>> specify is; unless it knows the exact guest OS and kernel version, and
>> kernel config values, it will never be able to correctly specify its
>> target to be exactly (e820 max_pfn - all holes).
>>
>> Should this commit be reverted?  Should the xen balloon target be
>> adjusted based on kernel-added e820 holes?
> 
> I think the second one but shouldn't current_pages be updated, and not
> the target? The latter is set by Xen (toolstack, via xenstore usually).

Right.

Looking into a HVM domU I can't see any problem related to
CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is
configured with 2048 MB of RAM, 8MB being video RAM. Looking into
/sys/devices/system/xen_memory/xen_memory0 I can see the current
size and target size do match: both are 2088960 kB (2 GB - 8 MB).

Ballooning down and up to 2048 MB again doesn't change the picture.

So which additional holes are added by the kernel on AWS via which
functions?


Juergen