linux-kernel - Re: Compaction & folios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YV4ujjM33QdLC8Xk@casper.infradead.org>
Date:   Thu, 7 Oct 2021 00:17:34 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Kent Overstreet <kent.overstreet@...il.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        hannes@...xchg.org, rientjes@...gle.com
Subject: Re: Compaction & folios

On Wed, Oct 06, 2021 at 06:53:41PM -0400, Kent Overstreet wrote:
> It may turn out that allocating hugepages still doesn't work as reliably as we'd
> like - but folios are still a big help even when we can't allocate a 2MB page,
> because we'll be able to fall back to an order 6 or 7 or 8 allocation, which is
> something we can't do now. And, since multiple CPU vendors now support
> coalescing contiguous PTE entries in the TLB, this will still get us most of the
> performance benefits of using hugepages.

I'd like to add two things:

1. A lot of people talk about the performance improvements from using
2MB pages, and there are the obvious hardware ones -- one fewer level
to dereference in the page table walk when there's a TLB miss; using a
single TLB entry to cache an entire 2MB page.

But there are the software ones, which I believe Google have measured
(perhaps it was the ChromeOS team?)  Allocating order-2/3/4 pages reduces
the length of the LRU list by a factor of 4/8/16.  That means we get 4-16x
memory reclaimed per unit of time, which reduces the LRU lock contention.
Not to mention the advantage of being able to use a pagevec to describe
960KB of memory rather than 60KB.

2. We can only measure what CPUs do today.  If our behaviour changes,
CPU vendors will adapt.  I talked to someone who dabbles in hardware
design who said that it really isn't that hard to design a TLB that
can support mapping 64KB entries at arbitrary 4KB offsets.  There's no
particular incentive for CPU manufacturers to do that today, but if we
start allocating 64KB pages to cache files, that will change.