linux-kernel - Re: [GIT PULL] Memory folios for v5.15

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YSVMAS2pQVq+xma7@casper.infradead.org>
Date:   Tue, 24 Aug 2021 20:44:01 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [GIT PULL] Memory folios for v5.15

On Tue, Aug 24, 2021 at 02:32:56PM -0400, Johannes Weiner wrote:
> The folio doc says "It is at least as large as %PAGE_SIZE";
> folio_order() says "A folio is composed of 2^order pages";
> page_folio(), folio_pfn(), folio_nr_pages all encode a N:1
> relationship. And yes, the name implies it too.
> 
> This is in direct conflict with what I'm talking about, where base
> page granularity could become coarser than file cache granularity.

That doesn't make any sense.  A page is the fundamental unit of the
mm.  Why would we want to increase the granularity of page allocation
and not increase the granularity of the file cache?

> Are we going to bump struct page to 2M soon? I don't know. Here is
> what I do know about 4k pages, though:
> 
> - It's a lot of transactional overhead to manage tens of gigs of
>   memory in 4k pages. We're reclaiming, paging and swapping more than
>   ever before in our DCs, because flash provides in abundance the
>   low-latency IOPS required for that, and parking cold/warm workload
>   memory on cheap flash saves expensive RAM. But we're continously
>   scanning thousands of pages per second to do this. There was also
>   the RWF_UNCACHED thread around reclaim CPU overhead at the higher
>   end of buffered IO rates. There is the fact that we have a pending
>   proposal from Google to replace rmap because it's too CPU-intense
>   when paging into compressed memory pools.

This seems like an argument for folios, not against them.  If user
memory (both anon and file) is being allocated in larger chunks, there
are fewer pages to scan, less book-keeping to do, and all you're paying
for that is I/O bandwidth.

> - It's a lot of internal fragmentation. Compaction is becoming the
>   default method for allocating the majority of memory in our
>   servers. This is a latency concern during page faults, and a
>   predictability concern when we defer it to khugepaged collapsing.

Again, the more memory that we allocate in higher-order chunks, the
better this situation becomes.

> - struct page is statically eating gigs of expensive memory on every
>   single machine, when only some of our workloads would require this
>   level of granularity for some of their memory. And that's *after*
>   we're fighting over every bit in that structure.

That, folios does not help with.  I have post-folio ideas about how
to address that, but I can't realistically start working on them
until folios are upstream.