linux-kernel - Re: Splitting struct page into multiple types

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211019175419.GA22532@hsiangkao-HP-ZHAN-66-Pro-G1>
Date:   Wed, 20 Oct 2021 01:54:20 +0800
From:   Gao Xiang <xiang@...nel.org>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Kent Overstreet <kent.overstreet@...il.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Christoph Hellwig <hch@...radead.org>,
        David Howells <dhowells@...hat.com>
Subject: Re: Splitting struct page into multiple types - Was: re: Folio
 discussion recap -

Hi Matthew,

On Tue, Oct 19, 2021 at 06:34:19PM +0100, Matthew Wilcox wrote:
> On Wed, Oct 20, 2021 at 01:06:04AM +0800, Gao Xiang wrote:
> > On Tue, Oct 19, 2021 at 12:11:35PM -0400, Kent Overstreet wrote:
> > > Other things that need to be fixed:
> > > 
> > >  - page->lru is used by the old .readpages interface for the list of pages we're
> > >    doing reads to; Matthew converted most filesystems to his new and improved
> > >    .readahead which thankfully no longer uses page->lru, but there's still a few
> > >    filesystems that need to be converted - it looks like cifs and erofs, not
> > >    sure what's going on with fs/cachefiles/. We need help from the maintainers
> > >    of those filesystems to get that conversion done, this is holding up future
> > >    cleanups.
> > 
> > The reason why using page->lru for non-LRU pages was just because the
> > page struct is already there and it's an effective way to organize
> > variable temporary pages without any extra memory overhead other than
> > page structure itself. Another benefits is that such non-LRU pages can
> > be immediately picked from the list and added into page cache without
> > any pain (thus ->lru can be reused for real lru usage).
> > 
> > In order to maximize the performance (so that pages can be shared in
> > the same read request flexibly without extra overhead rather than
> > memory allocation/free from/to the buddy allocator) and minimise extra
> > footprint, this way was used. I'm pretty fine to transfer into some
> > other way instead if some similar field can be used in this way.
> > 
> > Yet if no such field anymore, I'm also very glad to write a patch to
> > get rid of such usage, but I wish it could be merged _only_ with the
> > real final transformation together otherwise it still takes the extra
> > memory of the old page structure and sacrifices the overall performance
> > to end users (..thus has no benefits at all.)
> 
> I haven't dived in to clean up erofs because I don't have a way to test
> it, and I haven't taken the time to understand exactly what it's doing.

Actually I don't think it's an actual clean up due to the current page
structure design.

> 
> The old ->readpages interface gave you pages linked together on ->lru
> and this code seems to have been written in that era, when you would
> add pages to the page cache yourself.
> 
> In the new scheme, the pages get added to the page cache for you, and
> then you take care of filling them (and marking them uptodate if the
> read succeeds).  There's now readahead_expand() which you can call to add
> extra pages to the cache if the readahead request isn't compressed-block
> aligned.  Of course, it may not succeed if we're out of memory or there
> were already pages in the cache.

Hmmm, these temporary pages in the list may be (re)used later for page
cache,

or just used for temporary compressed pages for some I/O or lz4
decompression buffer (technically called lz77 sliding window) to
temporarily contain some decompressed data in the same read request
(due to some pages are already mapped and we cannot expose the
decompression process to userspace or some other reasons). All are
in the recycle way.

These temporary pages may finally go into some file page cache or
recycle for several temporary uses for many time and finally free to
the buddy allocator.

> 
> It looks like this will be quite a large change to how erofs handles
> compressed blocks, but if you're open to taking this on, I'd be very happy.

For ->lru, it's quite small, but it sacrifices the performance. Yet I'm
very glad to do if some decision of this ->lru field is determined.

Thanks,
Gao Xiang