linux-kernel - Re: [Xen-devel] bad page flags booting 32bit dom0 on 64bit hypervisor using dom0

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Date:	Wed, 11 May 2016 10:08:58 +0200
From:	Stefan Bader <stefan.bader@...onical.com>
To:	xen-devel <xen-devel@...ts.xenproject.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc:	Juergen Gross <jgross@...e.com>,
	David Vrabel <david.vrabel@...rix.com>,
	Mel Gorman <mgorman@...e.de>, Nathan Zimmer <nzimmer@....com>
Subject: Re: [Xen-devel] bad page flags booting 32bit dom0 on 64bit hypervisor
 using dom0_mem (kernel >=4.2)

On 02.05.2016 16:24, Stefan Bader wrote:
> On 02.05.2016 13:41, Juergen Gross wrote:
>> On 02/05/16 12:47, Stefan Bader wrote:
>>> I recently tried to boot 32bit dom0 on 64bit Xen host which I configured to run
>>> with a limited, fix amount of memory for dom0. It seems that somewhere between
>>> kernel versions 3.19 and 4.2 (sorry that is still a wide range) the Linux kernel
>>> would report bad page flags for a range of pages (which seem to be around the
>>> end of the guest pfn range). For a 4.2 kernel that was easily missed as the boot
>>> finished ok and dom0 was accessible. However starting with 4.4 (tested 4.5 and a
>>> 4.6-rc) the serial console output freezes after some of those bad page flag
>>> messages and then (unfortunately without any further helpful output) the host
>>> reboots (I assume there is a panic that triggers a reset).
>>>
>>> I suspect the problem is more a kernel side one. It is just possible to
>>> influence things by variation of dom0_mem=#,max:#. 512M seems ok, 1024M, 2048M,
>>> and 3072M cause bad page flags starting around kernel 4.2 and reboots around
>>> 4.4. Then 4096M and not clamping dom0 memory seem to be ok again (though not
>>> limiting dom0 memory seems to cause trouble on 32bit dom0 later when a domU
>>> tries to balloon memory, but I think that is a different problem).
>>>
>>> I have not seen this on a 64bit dom0. Below is an example of those bad page
>>> errors. Somehow it looks to be a page marked as reserved. Initially I wondered
>>> whether this could be a problem of not clearing page flags when moving mappings
>>> to match the e820. But I never looked into i386 memory setup in that detail. So
>>> I am posting this, hoping that someone may have an idea from the detail about
>>> where to look next. PAE is enabled there. Usually its bpf init that gets hit but
>>> that likely is just because that is doing the first vmallocs.
>>
>> Could you please post the kernel config, Xen and dom0 boot parameters?
>> I'm quite sure this is no common problem as there are standard tests
>> running for each kernel version including 32 bit dom0 with limited
>> memory size.
> 
> Hi Jürgen,
> 
> sure. Though by doing that I realized where I actually messed the whole thing
> up. I got the max limit syntax completely wrong. :( Instead of the correct
> "dom0_mem=1024M,max:1024M" I am using "dom0_mem=1024M:max=1024M" which I guess
> is like not having max set at all. Not sure whether that is a valid use case.
> 
> When I actually do the dom0_mem argument right, there are no bad page flag
> errors even in 4.4 with 1024M limit. I was at least consistent in my
> mis-configuration, so doing the same stupid thing on 64bit seems to be handled
> more gracefully.
> 
> Likely false alarm. But at least cut&pasting the config into mail made me spot
> the problem...
> 

Ok, thinking that "dom0_mem=x" (without a max or min) still is a valid case, I
went ahead and did a bisect for when the bad page flag issue started. I ended up at:

  92923ca "mm: meminit: only set page reserved in the memblock region"

And with a few more printks in the new functions I finally realized why this
goes wrong. The new reserve_bootmem_region is using unsigned long for start and
end addresses which just isn't working too well for 32bit.
For Xen dom0 the problem with that can just be more easily triggered. When dom0
memory is limited to a small size but allowed to balloon for more, the
additional system memory is put into reserved regions.
In my case a host with 8G memory and say 1G initial dom0 memory this created
(apart from other) one reserved region which started at 4GB and covered the
remaining 4G of host memory. Which reserve_bootmem_region() got as 0-4G due to
the unsigned long conversion. This basically marked *all* memory below 4G as
reserved.
The fix is relatively simple, just use phys_addr_t for start and end. I tested
this on 4.2 and 4.4 kernels. Both now boot without errors and neither does the
4.4 kernel crash. Maybe still not 100% safe when running on very large memory
systems (if I did not get the math wrong 16T) but at least some improvement...

-Stefan



View attachment "0001-mm-Use-phys_addr_t-for-reserve_bootmem_region-argume.patch" of type "text/x-diff" (2358 bytes)

Download attachment "signature.asc" of type "application/pgp-signature" (837 bytes)