lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 26 Nov 2020 15:30:01 -0500
From:   Andrea Arcangeli <aarcange@...hat.com>
To:     Mike Rapoport <rppt@...ux.ibm.com>
Cc:     David Hildenbrand <david@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mel Gorman <mgorman@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        Qian Cai <cai@....pw>, Michal Hocko <mhocko@...nel.org>,
        linux-kernel@...r.kernel.org, Baoquan He <bhe@...hat.com>
Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set
 pageblock_skip on reserved pages

On Thu, Nov 26, 2020 at 09:44:26PM +0200, Mike Rapoport wrote:
> TBH, the whole interaction between e820 and memblock keeps me puzzled
> and I can only make educated guesses why some ranges here are
> memblock_reserve()'d and some memblock_add()ed.

The mixed usage in that interaction between memblock.reserve and
memblock.memory where sometime it's used to reserve overlapping
memblock.memory ranges (clearly ok), and sometimes is used on ranges
with no overlap (not clear even why it's being called), is what makes
the current code messy.

We're basically passing down the exact same information (inverted),
through two different APIs, if there is no overlap.

> I think what should be there is that e820 entries that are essentially
> RAM, used by BIOS or not, should be listed in memblock.memory. Then
> using memblock_reserve() for parts that BIOS claimed for itself would
> have the same semantics as for memory allocated by kernel.
>
> I.e. if there is a DIMM from 0 to, say 512M, memblock.memory will have a
> range [0, 512M]. And areas such as 0x000-0xfff, 0x9d000-0x9ffff will be
> in memblock.reserved.
> 
> Than in page_alloc.c we'll know that we have a physical memory bank from
> 0 to 512M but there are some ranges that we cannot use.
>
> I suggested it back then when the issue with compaction was reported at
> the first time, but Baoquan mentioned that there are systems that cannot
> even tolerate having BIOS reserved areas in the page tables and I didn't
> continue to pursue this.

That explains why we can't add the e820 non-RAM regions to
memblock_add, what's not clear is why it should be required to call
memblock_reserve on a region that has no overlap with any memblock_add.

Instead of the patch that adds a walk to the memblock.reserve and that
requires adding even more memblock_reserve to e820__memblock_setup for
type 20, we can add a walk for the memblock.memory holes and then we
can remove the memblock_reserve for E820_TYPE_SOFT_RESERVED too.

Thanks,
Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ