[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa3a6258-5304-40b2-bb58-e6081ed845d1@redhat.com>
Date: Tue, 1 Jul 2025 20:44:09 +0200
From: David Hildenbrand <david@...hat.com>
To: Luiz Capitulino <luizcap@...hat.com>, willy@...radead.org
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, lcapitulino@...il.com, shivankg@....com
Subject: Re: [PATCH 3/3] fs: stable_page_flags(): use snapshot_page()
On 26.06.25 20:16, Luiz Capitulino wrote:
> A race condition is possible in stable_page_flags() where user-space is
> reading /proc/kpageflags concurrently to a folio split. This may lead to
> oopses or BUG_ON()s being triggered.
>
> To fix this, this commit uses snapshot_page() in stable_page_flags() so
> that stable_page_flags() works with a stable page and folio snapshots
> instead.
>
> Note that stable_page_flags() makes use of some functions that require
> the original page or folio pointer to work properly (eg.
> is_free_budy_page() and folio_test_idle()). Since those functions can't
> be used on the page snapshot, we replace their usage with flags that
> were set by snapshot_page() for this purpose.
>
> Signed-off-by: Luiz Capitulino <luizcap@...hat.com>
> ---
> fs/proc/page.c | 25 ++++++++++++++-----------
> 1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 936f8bbe5a6f..a2ee95f727f0 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -147,6 +147,7 @@ static inline u64 kpf_copy_bit(u64 kflags, int ubit, int kbit)
> u64 stable_page_flags(const struct page *page)
> {
> const struct folio *folio;
> + struct page_snapshot ps;
> unsigned long k;
> unsigned long mapping;
> bool is_anon;
> @@ -158,7 +159,9 @@ u64 stable_page_flags(const struct page *page)
> */
> if (!page)
> return 1 << KPF_NOPAGE;
> - folio = page_folio(page);
> +
> + snapshot_page(&ps, page);
> + folio = &ps.folio_snapshot;
>
> k = folio->flags;
> mapping = (unsigned long)folio->mapping;
> @@ -167,7 +170,7 @@ u64 stable_page_flags(const struct page *page)
> /*
> * pseudo flags for the well known (anonymous) memory mapped pages
> */
> - if (page_mapped(page))
> + if (folio_mapped(folio))
> u |= 1 << KPF_MMAP;
> if (is_anon) {
> u |= 1 << KPF_ANON;
> @@ -179,7 +182,7 @@ u64 stable_page_flags(const struct page *page)
> * compound pages: export both head/tail info
> * they together define a compound page's start/end pos and order
> */
> - if (page == &folio->page)
> + if (ps.idx == 0)
> u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
> else
> u |= 1 << KPF_COMPOUND_TAIL;
> @@ -189,10 +192,10 @@ u64 stable_page_flags(const struct page *page)
> folio_test_large_rmappable(folio)) {
> /* Note: we indicate any THPs here, not just PMD-sized ones */
> u |= 1 << KPF_THP;
> - } else if (is_huge_zero_folio(folio)) {
> + } else if (ps.flags & PAGE_SNAPSHOT_PG_HUGE_ZERO) {
For that, we could use
is_huge_zero_pfn(ps.pfn)
from
https://lkml.kernel.org/r/20250617154345.2494405-10-david@redhat.com
You should be able to cherry pick that commit (only minor conflict in
vm_normal_page_pmd()) and include it in this series.
> u |= 1 << KPF_ZERO_PAGE;
> u |= 1 << KPF_THP;
> - } else if (is_zero_folio(folio)) {
> + } else if (is_zero_pfn(ps.pfn)) {
> u |= 1 << KPF_ZERO_PAGE;
> }
>
> @@ -200,14 +203,14 @@ u64 stable_page_flags(const struct page *page)
> * Caveats on high order pages: PG_buddy and PG_slab will only be set
> * on the head page.
> */
> - if (PageBuddy(page))
> + if (PageBuddy(&ps.page_snapshot))
> u |= 1 << KPF_BUDDY;
> - else if (page_count(page) == 0 && is_free_buddy_page(page))
> + else if (ps.flags & PAGE_SNAPSHOT_PG_FREE)
Yeah, that is nasty, and inherently racy. So detecting it an snapshot
time might be best.
Which makes me wonder if this whole block should simply be
if (ps.flags & PAGE_SNAPSHOT_PG_BUDDY)
u |= 1 << KPF_BUDDY;
and you move all buddy detection into the snapshotting function. That
is, PAGE_SNAPSHOT_PG_BUDDY gets set for head and tail pages of buddy pages.
Looks less special that way ;)
> u |= 1 << KPF_BUDDY;
>
> - if (PageOffline(page))
> + if (folio_test_offline(folio))
> u |= 1 << KPF_OFFLINE;
> - if (PageTable(page))> + if (folio_test_pgtable(folio))
> u |= 1 << KPF_PGTABLE;
I assume, long-term none of these will actually be folios. But we can
change that once we get to it.
(likely, going back to pages ... just like for the slab case below)
> if (folio_test_slab(folio))
> u |= 1 << KPF_SLAB;
> @@ -215,7 +218,7 @@ u64 stable_page_flags(const struct page *page)
> #if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
> u |= kpf_copy_bit(k, KPF_IDLE, PG_idle);
> #else
> - if (folio_test_idle(folio))
> + if (ps.flags & PAGE_SNAPSHOT_PG_IDLE)
> u |= 1 << KPF_IDLE;
Another nasty 32bit case. At least once we decouple pages from folios,
the while test-idle in page-ext will vanish and this will get cleaned up.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists