linux-kernel - Splitting struct page into multiple types

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YW7uN2p8CihCDsln@moria.home.lan>
Date:   Tue, 19 Oct 2021 12:11:35 -0400
From:   Kent Overstreet <kent.overstreet@...il.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Christoph Hellwig <hch@...radead.org>,
        David Howells <dhowells@...hat.com>
Subject: Splitting struct page into multiple types - Was: re: Folio
 discussion recap -

On Mon, Oct 18, 2021 at 04:45:59PM -0400, Johannes Weiner wrote:
> On Mon, Oct 18, 2021 at 02:12:32PM -0400, Kent Overstreet wrote:
> > On Mon, Oct 18, 2021 at 12:47:37PM -0400, Johannes Weiner wrote:
> > > I find this line of argument highly disingenuous.
> > > 
> > > No new type is necessary to remove these calls inside MM code. Migrate
> > > them into the callsites and remove the 99.9% very obviously bogus
> > > ones. The process is the same whether you switch to a new type or not.
> > 
> > Conversely, I don't see "leave all LRU code as struct page, and ignore anonymous
> > pages" to be a serious counterargument. I got that you really don't want
> > anonymous pages to be folios from the call Friday, but I haven't been getting
> > anything that looks like a serious counterproposal from you.
> > 
> > Think about what our goal is: we want to get to a world where our types describe
> > unambigiuously how our data is used. That means working towards
> >  - getting rid of type punning
> >  - struct fields that are only used for a single purpose
> 
> How is a common type inheritance model with a generic page type and
> subclasses not a counter proposal?
> 
> And one which actually accomplishes those two things you're saying, as
> opposed to a shared folio where even 'struct address_space *mapping'
> is a total lie type-wise?
> 
> Plus, really, what's the *alternative* to doing that anyway? How are
> we going to implement code that operates on folios and other subtypes
> of the page alike? And deal with attributes and properties that are
> shared among them all? Willy's original answer to that was that folio
> is just *going* to be all these things - file, anon, slab, network,
> rando driver stuff. But since that wasn't very popular, would not get
> rid of type punning and overloaded members, would get rid of
> efficiently allocating descriptor memory etc.- what *is* the
> alternative now to common properties between split out subtypes?
> 
> I'm not *against* what you and Willy are saying. I have *genuinely
> zero idea what* you are saying.

So we were starting to talk more concretely last night about the splitting of
struct page into multiple types, and what that means for page->lru.

The basic process I've had in mind for splitting struct page up into multiple
types is: create a new type for each struct in the union-of-structs, change code
to refer to that type instead of struct page, then - importantly - delete those
members from the union-of-structs in struct page.

E.g. for struct slab, after Willy's struct slab patches, we want to delete that
stuff from struct page - otherwise we've introduced new type punning where code
can refer to the same members via struct page and struct slab, and it's also
completely necessary in order to separately allocate these new structs and slim
down struct page.

Roughly what I've been envisioning for folios is that the struct in the
union-of-structs with lru, mapping & index - that's what turns into folios.

Note that we have a bunch of code using page->lru, page->mapping, and
page->index that really shouldn't be. The buddy allocator uses page->lru for
freelists, and it shouldn't be, but there's a straightforward solution for that:
we need to create a new struct in the union-of-structs for free pages, and
confine the buddy allocator to that (it'll be a nice cleanup, right now it's
overloading both page->lru and page->private which makes no sense, and it'll
give us a nice place to stick some other things).

Other things that need to be fixed:

 - page->lru is used by the old .readpages interface for the list of pages we're
   doing reads to; Matthew converted most filesystems to his new and improved
   .readahead which thankfully no longer uses page->lru, but there's still a few
   filesystems that need to be converted - it looks like cifs and erofs, not
   sure what's going on with fs/cachefiles/. We need help from the maintainers
   of those filesystems to get that conversion done, this is holding up future
   cleanups.

 - page->mapping and page->index are used for entirely random purposes by some
   driver code - drivers/net/ethernet/sun/niu.c looks to be using page->mapping
   for a singly linked list (!).

 - unrelated, but worth noting: there's a fair amount of filesystem code that
   uses page->mapping and page->index and doesn't need to because it has it from
   context - it's both a performance improvement and a cleanup to change that
   code to not get it from the page.

Basically, we need to get to a point where each field in struct page is used for
one and just one thing, but that's going to take some time.

You've been noting that page->mapping is used for different things depending on
whether it's a file page or an anonymous page, and I agree that that's not ideal -
but it's one that I'm much less concerned about because a field being used for
two different things that are both core and related concepts in the kernel is
less bad than fields that are used as dumping grounds for whatever is
convenient - file & anon overloading page->mapping is just not the most pressing
issue to me.

Also, let's look at what file & anonymous pages share:
 - they're both mapped to userspace - they both need page->mapcount
 - they both share the lru code - they both need page->lru

page->lru is the real decider for me, because getting rid of non-lru uses of
that field looks very achievable to me, and once it's done it's one of the
fields we want to delete from struct page and move to struct folio.

If we leave the lru code using struct page, it creates a real problem for this
approach - it means we won't be able to delete the folio struct from the
union-of-structs in struct page. I'm not sure what our path forward would be.

That's my resistance to trying to separate file & anon at this point. I'm
definitely not saying we shouldn't separate file & anon in the future - I don't
have an opinion on whether or not it should be done, and if we do want to do
that I'd want to think about doing it by embedding a "struct lru_object" into
both file_folio and anon_folio and having the lru code refer that instead of
struct page - embedding an object is generally preferable to inheritence.

I want to say - and I don't think I've been clear enough about this - my
objection to trying to split up file & anonymous pages into separate types isn't
so much based on any deep philosophical reasons (I have some ideas for making
anonymous pages more like file pages that I would like to attempt, but I also
heard you when you said you'd tried to do that in the past and it hadn't worked
out) - my objection is because I think it would very much get in the way of
shorter term cleanups that are much more pressing. This is what I've been
referring to when I've been talking about following the union-of-structs in
splitting up struct page - I'm just trying to be practical.

Another relevant thing we've been talking about is consolidating the types of
pages that can be mapped into userspace. Right now we've got driver code mapping
all sorts of rando pages into userspace, and this isn't good - pages in theory
have this abstract interface that they implement, and pages mapped into
userspace have a bigger and more complicated interface - i.e.
a_ops.set_page_dirty; any page mapped into userspace can have this called on it
via the O_DIRECT read path, and possibly other things. Right now we have drivers
allocating vmalloc() memory and then mapping it into userspace, which is just
bizarre - what chunk of code really owns that page, and is implementing that
interface? vmalloc, or the driver?

What I'd like to see happen is for those to get switched to some sort of
internal device or inode, something that the driver owns and has an a_ops struct
- at this point they'd just be normal file pages. The reason drivers are mapping
vmalloc() memory into userspace is so they can get it into a contiguous kernel
side memory mapping, but they could also be doing that by calling vmap() on
existing pages - I think that would be much cleaner.

I have no idea if this approach works for network pool pages or how those would
be used, I haven't gotten that far - if someone can chime in about those that
would be great. But, the end goal I'm envisioning is a world where _only_ bog
standard file & anonymous pages are mapped to userspace - then _mapcount can be
deleted from struct page and only needs to live in struct folio.

Anyways, that's another thing to consider when thinking about whether file &
anonymous pages should be the same type.