linux-kernel - Re: Folio discussion recap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YUpC3oV4II+u+lzQ@casper.infradead.org>
Date:   Tue, 21 Sep 2021 21:38:54 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Kent Overstreet <kent.overstreet@...il.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Christoph Hellwig <hch@...radead.org>,
        David Howells <dhowells@...hat.com>
Subject: Re: Folio discussion recap

On Tue, Sep 21, 2021 at 03:47:29PM -0400, Johannes Weiner wrote:
> This discussion is now about whether folio are suitable for anon pages
> as well. I'd like to reiterate that regardless of the outcome of this
> discussion I think we should probably move ahead with the page cache
> bits, since people are specifically blocked on those and there is no
> dependency on the anon stuff, as the conversion is incremental.

So you withdraw your NAK for the 5.15 pull request which is now four
weeks old and has utterly missed the merge window?

> and so the justification for replacing page with folio *below* those
> entry points to address tailpage confusion becomes nil: there is no
> confusion. Move the anon bits to anon_page and leave the shared bits
> in page. That's 912 lines of swap_state.c we could mostly leave alone.

Your argument seems to be based on "minimising churn".  Which is certainly
a goal that one could have, but I think in this case is actually harmful.
There are hundreds, maybe thousands, of functions throughout the kernel
(certainly throughout filesystems) which assume that a struct page is
PAGE_SIZE bytes.  Yes, every single one of them is buggy to assume that,
but tracking them all down is a never-ending task as new ones will be
added as fast as they can be removed.

> The same is true for the LRU code in swap.c. Conceptually, already no
> tailpages *should* make it onto the LRU. Once the high-level page
> instantiation functions - add_to_page_cache_lru, do_anonymous_page -
> have type safety, you really do not need to worry about tail pages
> deep in the LRU code. 1155 more lines of swap.c.

It's actually impossible in practice as well as conceptually.  The list
LRU is in the union with compound_head, so you cannot put a tail page
onto the LRU.  But yet we call compound_head() on every one of them
multiple times because our current type system does not allow us to
express "this is not a tail page".

> The anon_page->page relationship may look familiar too. It's a natural
> type hierarchy between superclass and subclasses that is common in
> object oriented languages: page has attributes and methods that are
> generic and shared; anon_page and file_page encode where their
> implementation differs.
> 
> A type system like that would set us up for a lot of clarification and
> generalization of the MM code. For example it would immediately
> highlight when "generic" code is trying to access type-specific stuff
> that maybe it shouldn't, and thus help/force us refactor - something
> that a shared, flat folio type would not.

If you want to try your hand at splitting out anon_folio from folio
later, be my guest.  I've just finished splitting out 'slab' from page,
and I'll post it later.  I don't think that splitting anon_folio from
folio is worth doing, but will not stand in your way.  I do think that
splitting tail pages from non-tail pages is worthwhile, and that's what
this patchset does.