linux-kernel - Re: Folios for 5.15 request - Was: re: Folio discussion recap -

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c18923a1-8144-785e-5fb3-5cbce4be1310@redhat.com>
Date:   Fri, 22 Oct 2021 16:40:24 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Kent Overstreet <kent.overstreet@...il.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Christoph Hellwig <hch@...radead.org>,
        David Howells <dhowells@...hat.com>,
        Hugh Dickins <hughd@...gle.com>
Subject: Re: Folios for 5.15 request - Was: re: Folio discussion recap -

On 22.10.21 15:01, Matthew Wilcox wrote:
> On Fri, Oct 22, 2021 at 09:59:05AM +0200, David Hildenbrand wrote:
>> something like this would roughly express what I've been mumbling about:
>>
>> anon_mem    file_mem
>>    |            |
>>    ------|------
>>       lru_mem       slab
>>          |           |
>>          -------------
>>                |
>> 	      page
>>
>> I wouldn't include folios in this picture, because IMHO folios as of now
>> are actually what we want to be "lru_mem", just which a much clearer
>> name+description (again, IMHO).
> 
> I think folios are a superset of lru_mem.  To enhance your drawing:
> 

In the picture below we want "folio" to be the abstraction of "mappable
into user space", after reading your link below and reading your graph,
correct? Like calling it "user_mem" instead.

Because any of these types would imply that we're looking at the head
page (if it's a compound page). And we could (or even already have?)
have other types that cannot be mapped to user space that are actually a
compound page.

> page
>    folio
>       lru_mem
>          anon_mem
> 	 ksm
>          file_mem
>       netpool
>       devmem
>       zonedev
>    slab
>    pgtable
>    buddy
>    zsmalloc
>    vmalloc
> 
> I have a little list of memory types here:
> https://kernelnewbies.org/MemoryTypes
> 
> Let me know if anything is missing.

hugetlbfs pages might deserve a dedicated type, right?


> 
>> Going from file_mem -> page is easy, just casting pointers.
>> Going from page -> file_mem requires going to the head page if it's a
>> compound page.
>>
>> But we expect most interfaces to pass around a proper type (e.g.,
>> lru_mem) instead of a page, which avoids having to lookup the compund
>> head page. And each function can express which type it actually wants to
>> consume. The filmap API wants to consume file_mem, so it should use that.
>>
>> And IMHO, with something above in mind and not having a clue which
>> additional layers we'll really need, or which additional leaves we want
>> to have, we would start with the leaves (e.g., file_mem, anon_mem, slab)
>> and work our way towards the root. Just like we already started with slab.
> 
> That assumes that the "root" layers already handle compound pages
> properly.  For example, nothing in mm/page-writeback.c does; it assumes
> everything is an order-0 page.  So working in the opposite direction
> makes sense because it tells us what has already been converted and is
> thus safe to call.

Right, as long as the lower layers receive a "struct page", they have to
assume it's "anything" -- IOW a random base page.

We need some temporary logic when transitioning from "typed" code into
"struct page" code that doesn't talk compound pages yet, I agree. And I
think the different types used actually would tell us what has been
converted and what not. Whenever you have to go from type -> "struct
page" we have to be very careful.

> 
> And starting with file_mem makes the supposition that it's worth splitting
> file_mem from anon_mem.  I believe that's one or two steps further than
> it's worth, but I can be convinced otherwise.  For example, do we have
> examples of file pages being passed to routines that expect anon pages?

That would be a BUG, so I hope we don't have it ;)

> Most routines that I've looked at expect to see both file & anon pages,

Right, many of them do. Which tells me that they share a common type in
many places.

Let's consider LRU code

static inline int folio_is_file_lru(struct folio *folio)
{
	return !folio_swapbacked(folio);
}

I would say we don't really want to pass folios here. We actually want
to pass something reasonable, like "lru_mem". But yes, it's just doing
what "struct page" used to do via page_is_file_lru().


Let's consider folio_wait_writeback(struct folio *folio)

Do we actually want to pass in a folio here? Would we actually want to
pass in lru_mem here or even something else?

> and treat them either identically or do slightly different things.
> But those are just the functions I've looked at; your experience may be
> quite different.

I assume when it comes to LRU, writeback, ... the behavior is very
similar or at least the current functions just decide internally what to
do based on e.g., ..._is_file_lru().

I don't know if it's best to keep hiding that functionality within an
abstracted type or just provide two separate functions for anon and
file. folios mostly mimic what the old struct page used to do,
introducing similar functions. Maybe the reason we branch off within
these functions is because it just made sense when passing around
"struct page" and not having something clearer at hand that let the
caller do the branch. For the cases of LRU I looked at it somewhat makes
sense to just do it internally.

Looking at some core MM code, like mm/huge_memory.c, and seeing all the
PageAnon() specializations, having a dedicated anon_mem type might be
valuable. But at this point it's hard to tell if splitting up these
functions would actually be desirable.

We're knee-deep in the type discussion now and I appreciate it. I can
understand that folio are currently really just a "not a tail page"
concept and mimic a lot of what we already inherited from the old
"struct page" world.

-- 
Thanks,

David / dhildenb