lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <Z37pxbkHPbLYnDKn@casper.infradead.org>
Date: Wed, 8 Jan 2025 21:10:29 +0000
From: Matthew Wilcox <willy@...radead.org>
To: linux-mm@...ck.org
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org
Subject: State Of The Page (January 2025)

As the calendar turns, I'd like to lay out some goals for the coming year.
I think we can accomplish a big goal this year; transitioning from the
Ottawa interpretation of struct folio to the New York interpretation.

The tension between the two interpretations of folio go back to the
initial discussions.  The Ottawa interpretation is simply "A folio is
a non-tail page".  The New York interpretation is "A folio is its own
data structure which refers to one or more pages".  We agreed to start
with the Ottawa interpretation and later transition to the New York
interpretation once we were sufficiently far through the conversion
process, and I think that time has come.

We've made some mistakes (and deliberately done some things which won't
work any more).  That's OK; it's just software and we can change it.
The biggest change that I think will affect users is that it will no
longer be the case that all pages are part of a folio.  Some pages will
belong to a slab or a ptdesc or some other memdesc.  Some pages will be
"bare" and not belong to any memdesc.

For example, page_folio() is going to return NULL if the page does
not belong to a folio.  Some APIs will redirect silently; for example,
calling put_page() on a page belonging to a folio will decrement the
folio's refcount, not the page's refcount.  Calling compound_head()
will only work on "bare" pages; calling it on a page which belongs to
a folio/slab/... will be a BUG().

I think it's a reasonable goal to shrink struct page to (approximately):

struct page {
    unsigned long flags;
    union {
        struct list_head buddy_list;
        struct list_head pcp_list;
        struct {
            unsigned long memdesc;
            int _refcount;
        };
    };
    union {
        unsigned long private;
        struct {
            int _folio_mapcount;
        };
    };
};

This is just 32 bytes [1] which halves the size of memmap.  In order to
get to this point, we have some projects that need to be finished.

1. Remove accesses to page->index.  This project is almost complete.

2. Remove accesses to page->lru. This is really a set of many small
projects.  Some tactics can be shared between different projects, but
this really requires looking into each usage and figuring out how to
replace it.  Page migration looks particularly tricky.

3. Remove accesses to page->mapping.  In a filesystem, this is
converting to folios.  I have a plan for movable_ops (but not a plan for
its use of ->lru, so more thought needed)

4. Remove use of &folio->page.  Sometimes this is using folio_page(folio,
0) (eg bio_add_folio()) sometimes this is pushing folios into the called
function (eg block_commit_write()).

5. Remove bh->b_page as it's also effectively a cast between page & folio.
I have a git branch which finishes this project and need to send out the
patches.

6. Split the pagepool bump allocator out of struct page, as has been
done for, eg, slab and ptdesc.

7. Fix memcg_data so that slabs, folios and bare pages are each accounted
appropriately.  I've started working on this but need to reassess.


Once all these projects are completed, we can allocate folio, bump, ptdesc,
slab, zsdesc, etc separately and point to them from struct page.  Then
we can shrink struct page to the above definition.

[1] On 64-bit systems.  Just 16 bytes on 32-bit systems!  Except we
probably need to include 'virtual' to continue to support kmap() and
_last_cpupid, so maybe it ends up being 24 bytes on 32-bit and then we
just round it up to 32 so it's a power of two ...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ