linux-kernel - Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1c4c405b-52e0-cf6b-1f82-91a0a1e3dd53@suse.cz>
Date:   Wed, 25 Nov 2020 13:08:54 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Andrea Arcangeli <aarcange@...hat.com>,
        David Hildenbrand <david@...hat.com>
Cc:     Mel Gorman <mgorman@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        Qian Cai <cai@....pw>, Michal Hocko <mhocko@...nel.org>,
        linux-kernel@...r.kernel.org, Mike Rapoport <rppt@...ux.ibm.com>,
        Baoquan He <bhe@...hat.com>
Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set
 pageblock_skip on reserved pages

On 11/25/20 6:34 AM, Andrea Arcangeli wrote:
> Hello,
> 
> On Mon, Nov 23, 2020 at 02:01:16PM +0100, Vlastimil Babka wrote:
>> On 11/21/20 8:45 PM, Andrea Arcangeli wrote:
>> > A corollary issue was fixed in
>> > 39639000-39814fff : Unknown E820 type
>> > 
>> > pfn 0x7a200 -> 0x7a200000 min_pfn hit non-RAM:
>> > 
>> > 7a17b000-7a216fff : Unknown E820 type
>> 
>> It would be nice to also provide a /proc/zoneinfo and how exactly the 
>> "zone_spans_pfn" was violated. I assume we end up below zone's 
>> start_pfn, but is it true?
> 
> Agreed, I was about to grab that info along with all page struct
> around the pfn 0x7a200 and phys address 0x7a216fff.
> 
> # grep -A1 E820 /proc/iomem
> 7a17b000-7a216fff : Unknown E820 type
> 7a217000-7bffffff : System RAM
> 
> DMA      zone_start_pfn 1            zone_end_pfn() 4096         contiguous 1
> DMA32    zone_start_pfn 4096         zone_end_pfn() 1048576      contiguous 0
> Normal   zone_start_pfn 1048576      zone_end_pfn() 4715392      contiguous 1
> Movable  zone_start_pfn 0            zone_end_pfn() 0            contiguous 0

So the above means that around the "Unknown E820 type" we have:

pfn 499712 - start of pageblock in ZONE_DMA32
pfn 500091 - start of the "Unknown E820 type" range
pfn 500224 - start of another pageblock
pfn 500246 - end of "Unknown E820 type"

So this is indeed not a zone boundary issue, but basically a hole not 
aligned to pageblock boundary and really unexpected.
We have CONFIG_HOLES_IN_ZONE (that x86 doesn't set) for architectures 
that do this, and even that config only affects pfn_valid_within(). But 
here pfn_valid() is true, but the zone/node linkage is unexpected.

> However the real bug seems that reserved pages have a zero zone_id in
> the page->flags when it should have the real zone id/nid. The patch I
> sent earlier to validate highest would only be needed to deal with
> pfn_valid.
> 
> Something must have changed more recently than v5.1 that caused the
> zoneid of reserved pages to be wrong, a possible candidate for the
> real would be this change below:
> 
> +               __init_single_page(pfn_to_page(pfn), pfn, 0, 0);
> 
> Even if it may not be it, at the light of how the reserved page
> zoneid/nid initialized went wrong, the above line like it's too flakey
> to stay.
> 
> It'd be preferable if the pfn_valid fails and the
> pfn_to_section_nr(pfn) returns an invalid section for the intermediate
> step. Even better memset 0xff over the whole page struct until the
> second stage comes around.
> 
> Whenever pfn_valid is true, it's better that the zoneid/nid is correct
> all times, otherwise if the second stage fails we end up in a bug with
> weird side effects.

Yeah I guess it would be simpler if zoneid/nid was correct for 
pfn_valid() pfns within a zone's range, even if they are reserved due 
not not being really usable memory.

I don't think we want to introduce CONFIG_HOLES_IN_ZONE to x86. If the 
chosen solution is to make this to a real hole, the hole should be 
extended to MAX_ORDER_NR_PAGES aligned boundaries.

In any case, compaction code can't fix this with better range checks.

> Maybe it's not the above that left a zero zoneid though, I haven't
> tried to bisect it yet to look how the page->flags looked like on a
> older kernel that didn't seem to reproduce this crash, I'm just
> guessing.
> 
> Thanks,
> Andrea
>