linux-kernel - Re: [GIT PULL] Memory folios for v5.15

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2476941.1630061342@warthog.procyon.org.uk>
Date:   Fri, 27 Aug 2021 11:49:02 +0100
From:   David Howells <dhowells@...hat.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     dhowells@...hat.com, Matthew Wilcox <willy@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [GIT PULL] Memory folios for v5.15

Johannes Weiner <hannes@...xchg.org> wrote:

> 
> On Thu, Aug 26, 2021 at 09:58:06AM +0100, David Howells wrote:
> > One thing I like about Willy's folio concept is that, as long as everyone uses
> > the proper accessor functions and macros, we can mostly ignore the fact that
> > they're 2^N sized/aligned and they're composed of exact multiples of pages.
> > What really matters are the correspondences between folio size/alignment and
> > medium/IO size/alignment, so you could look on the folio as being a tool to
> > disconnect the filesystem from the concept of pages.
> >
> > We could, in the future, in theory, allow the internal implementation of a
> > folio to shift from being a page array to being a kmalloc'd page list or
> > allow higher order units to be mixed in.  The main thing we have to stop
> > people from doing is directly accessing the members of the struct.
> 
> In the current state of the folio patches, I agree with you. But
> conceptually, folios are not disconnecting from the page beyond
> PAGE_SIZE -> PAGE_SIZE * (1 << folio_order()). This is why I asked
> what the intended endgame is. And I wonder if there is a bit of an
> alignment issue between FS and MM people about the exact nature and
> identity of this data structure.

Possibly.  I would guess there are a couple of reasons that on the MM side
particularly it's dealt with as a strict array of pages: efficiency and
mmap-related faults.

It's most efficient to treat it as an array of contiguous pages as that
removes the need for indirection.  From the pov of mmap, faults happen
along the lines of h/w page divisions.

From an FS point of view, at minimum, I just need to know the state of the
folio.  If a page fault dirties several folios, that's fine.  If I can find
out that a folio was partially dirtied, that's useful, but not critical.  I am
a bit concerned about higher-order folios causing huge writes - but I do
realise that we might want to improve TLB/PT efficiency by using larger
entries and that that comes with consequences for mmapped writes.

> At the current stage of conversion, folio is a more clearly delineated
> API of what can be safely used from the FS for the interaction with
> the page cache and memory management. And it looks still flexible to
> make all sorts of changes, including how it's backed by
> memory. Compared with the page, where parts of the API are for the FS,
> but there are tons of members, functions, constants, and restrictions
> due to the page's role inside MM core code. Things you shouldn't be
> using, things you shouldn't be assuming from the fs side, but it's
> hard to tell which is which, because struct page is a lot of things.

I definitely like the API cleanup that folios offer.  However, I do think
Willy needs to better document the differences between some of the functions,
or at least when/where they should be used - folio_mapping() and
folio_file_mapping() being examples of this.

> However, the MM narrative for folios is that they're an abstraction
> for regular vs compound pages. This is rather generic. Conceptually,
> it applies very broadly and deeply to MM core code: anonymous memory
> handling, reclaim, swapping, even the slab allocator uses them. If we
> follow through on this concept from the MM side - and that seems to be
> the plan - it's inevitable that the folio API will grow more
> MM-internal members, methods, as well as restrictions again in the
> process. Except for the tail page bits, I don't see too much in struct
> page that would not conceptually fit into this version of the folio.
> 
> The cache_entry idea is really just to codify and retain that
> domain-specific minimalism and clarity from the filesystem side. As
> well as the flexibility around how backing memory is implemented,
> which I think could come in handy soon, but isn't the sole reason.

I can see while you might want the clarification.  However, at this point, can
you live with this set of folio patches?  Can you live with the name?  Could
you live with it if "folio" was changed to something else?

I would really like to see this patchset get in.  It's hanging over changes I
and others want to make that will conflict with Willy's changes.  If we can
get the basic API of folios in now, that's means I can make my changes on top
of them.

Thanks,
David