lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YSQSkSOWtJCE4g8p@cmpxchg.org>
Date:   Mon, 23 Aug 2021 17:26:41 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [GIT PULL] Memory folios for v5.15

On Mon, Aug 23, 2021 at 08:01:44PM +0100, Matthew Wilcox wrote:
> Hi Linus,
> 
> I'm sending this pull request a few days before the merge window
> opens so you have time to think about it.  I don't intend to make any
> further changes to the branch, so I've created the tag and signed it.
> It's been in Stephen's next tree for a few weeks with only minor problems
> (now addressed).
> 
> The point of all this churn is to allow filesystems and the page cache
> to manage memory in larger chunks than PAGE_SIZE.  The original plan was
> to use compound pages like THP does, but I ran into problems with some
> functions that take a struct page expect only a head page while others
> expect the precise page containing a particular byte.
> 
> This pull request converts just parts of the core MM and the page cache.
> For 5.16, we intend to convert various filesystems (XFS and AFS are ready;
> other filesystems may make it) and also convert more of the MM and page
> cache to folios.  For 5.17, multi-page folios should be ready.
> 
> The multi-page folios offer some improvement to some workloads.  The 80%
> win is real, but appears to be an artificial benchmark (postgres startup,
> which isn't a serious workload).  Real workloads (eg building the kernel,
> running postgres in a steady state, etc) seem to benefit between 0-10%.
> I haven't heard of any performance losses as a result of this series.
> Nobody has done any serious performance tuning; I imagine that tweaking
> the readahead algorithm could provide some more interesting wins.
> There are also other places where we could choose to create large folios
> and currently do not, such as writes that are larger than PAGE_SIZE.
> 
> I'd like to thank all my reviewers who've offered review/ack tags:
> 
> Christoph Hellwig <hch@....de>
> David Howells <dhowells@...hat.com>
> Jan Kara <jack@...e.cz>
> Jeff Layton <jlayton@...nel.org>
> Johannes Weiner <hannes@...xchg.org>

Just to clarify, I'm only on this list because I acked 3 smaller,
independent memcg cleanup patches in this series. I have repeatedly
expressed strong reservations over folios themselves.

The arguments for a better data interface between mm and filesystem in
light of variable page sizes are plentiful and convincing. But from an
MM point of view, it's all but clear where the delineation between the
page and folio is, and what the endgame is supposed to look like.

One one hand, the ambition appears to substitute folio for everything
that could be a base page or a compound page even inside core MM
code. Since there are very few places in the MM code that expressly
deal with tail pages in the first place, this amounts to a conversion
of most MM code - including the LRU management, reclaim, rmap,
migrate, swap, page fault code etc. - away from "the page".

However, this far exceeds the goal of a better mm-fs interface. And
the value proposition of a full MM-internal conversion, including
e.g. the less exposed anon page handling, is much more nebulous. It's
been proposed to leave anon pages out, but IMO to keep that direction
maintainable, the folio would have to be translated to a page quite
early when entering MM code, rather than propagating it inward, in
order to avoid huge, massively overlapping page and folio APIs.

It's also not clear to me that using the same abstraction for compound
pages and the file cache object is future proof. It's evident from
scalability issues in the allocator, reclaim, compaction, etc. that
with current memory sizes and IO devices, we're hitting the limits of
efficiently managing memory in 4k base pages per default. It's also
clear that we'll continue to have a need for 4k cache granularity for
quite a few workloads that work with large numbers of small files. I'm
not sure how this could be resolved other than divorcing the idea of a
(larger) base page from the idea of cache entries that can correspond,
if necessary, to memory chunks smaller than a default page.

A longer thread on that can be found here:
https://lore.kernel.org/linux-fsdevel/YFja%2FLRC1NI6quL6@cmpxchg.org/

As an MM stakeholder, I don't think folios are the answer for MM code.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ