lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 28 Jun 2019 19:38:13 +0200
From:   Juergen Gross <jgross@...e.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     linux-mm@...ck.org,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: fix regression with deferred struct page init

On 28.06.19 17:17, Michal Hocko wrote:
> On Thu 20-06-19 18:08:21, Juergen Gross wrote:
>> Commit 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time
>> instead of doing larger sections") is causing a regression on some
>> systems when the kernel is booted as Xen dom0.
>>
>> The system will just hang in early boot.
>>
>> Reason is an endless loop in get_page_from_freelist() in case the first
>> zone looked at has no free memory. deferred_grow_zone() is always
> 
> Could you explain how we ended up with the zone having no memory? Is
> xen "stealing" memblock memory without adding it to memory.reserved?
> In other words, how do we end up with an empty zone that has non zero
> end_pfn?

Why do you think Xen is stealing the memory in an odd way?

Doesn't deferred_init_mem_pfn_range_in_zone() return false when no free
memory is found? So exactly if the memory was added to memory.reserved
that will happen.

I guess the difference to a bare metal boot is that a Xen dom0 will need
probably more memory in early boot phase, so that issue is more likely
to occur.

In my case the system had two zones, where the 2nd zone had some free
memory. The search never made it to the 2nd zone as the search ended in
an endless loop for the 1st zone.

> 
>> returning true due to the following code snipplet:
>>
>>    /* If the zone is empty somebody else may have cleared out the zone */
>>    if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
>>                                             first_deferred_pfn)) {
>>            pgdat->first_deferred_pfn = ULONG_MAX;
>>            pgdat_resize_unlock(pgdat, &flags);
>>            return true;
>>    }
>>
>> This in turn results in the loop as get_page_from_freelist() is
>> assuming forward progress can be made by doing some more struct page
>> initialization.
> 
> The patch looks correct. The code is subtle but the comment helps.
> 
>> Cc: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
>> Fixes: 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections")
>> Suggested-by: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
>> Signed-off-by: Juergen Gross <jgross@...e.com>
> 
> Acked-by: Michal Hocko <mhocko@...e.com>

Thanks,

Juergen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ