linux-kernel - Re: [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGsJ_4wUQdQyB_3y0Buf3uG34hvgpMAP3qHHwJM3=R01RJOuvw@mail.gmail.com>
Date: Sat, 29 Nov 2025 15:55:19 +0800
From: Barry Song <21cnbao@...il.com>
To: zhongjinji <zhongjinji@...or.com>
Cc: zhanghongru06@...il.com, Liam.Howlett@...cle.com, 
	akpm@...ux-foundation.org, axelrasmussen@...gle.com, david@...nel.org, 
	hannes@...xchg.org, jackmanb@...gle.com, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, lorenzo.stoakes@...cle.com, mhocko@...e.com, 
	rppt@...nel.org, surenb@...gle.com, vbabka@...e.cz, weixugc@...gle.com, 
	yuanchu@...gle.com, zhanghongru@...omi.com, ziy@...dia.com
Subject: Re: [PATCH 2/3] mm/vmstat: get fragmentation statistics from
 per-migragetype count

On Sat, Nov 29, 2025 at 8:00 AM Barry Song <21cnbao@...il.com> wrote:
>
> > >       if (order >= pageblock_order && !is_migrate_isolate(migratetype))
> > >               __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
> > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > index bb09c032eecf..9334bbbe1e16 100644
> > > --- a/mm/vmstat.c
> > > +++ b/mm/vmstat.c
> > > @@ -1590,32 +1590,16 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
> > >                                       zone->name,
> > >                                       migratetype_names[mtype]);
> > >               for (order = 0; order < NR_PAGE_ORDERS; ++order) {
> > > -                     unsigned long freecount = 0;
> > > -                     struct free_area *area;
> > > -                     struct list_head *curr;
> > > +                     unsigned long freecount;
> > >                       bool overflow = false;
> > >
> > > -                     area = &(zone->free_area[order]);
> > > -
> > > -                     list_for_each(curr, &area->free_list[mtype]) {
> > > -                             /*
> > > -                              * Cap the free_list iteration because it might
> > > -                              * be really large and we are under a spinlock
> > > -                              * so a long time spent here could trigger a
> > > -                              * hard lockup detector. Anyway this is a
> > > -                              * debugging tool so knowing there is a handful
> > > -                              * of pages of this order should be more than
> > > -                              * sufficient.
> > > -                              */
> > > -                             if (++freecount >= 100000) {
> > > -                                     overflow = true;
> > > -                                     break;
> > > -                             }
> > > +                     /* Keep the same output format for user-space tools compatibility */
> > > +                     freecount = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
> >
> > I think it might be better for using an array of size NR_PAGE_ORDERS to store
> > the free count for each order. Like the code below.
>
> Right. If we want the freecount to accurately reflect the current system
> state, we still need to take the zone lock.
>
> Multiple independent WRITE_ONCE and READ_ONCE operations do not guarantee
> correctness. They may ensure single-copy atomicity per access, but not for the
> overall result.

On second thought, the original code releases and re-acquires the spinlock
for each order, so cross-variable consistency may not be a real issue.
Adding data_race() to silence KCSAN warnings should be sufficient？
I mean something like the following.

@@ -843,8 +842,8 @@ static inline void move_to_free_list(struct page
*page, struct zone *zone,
                     get_pageblock_migratetype(page), old_mt, nr_pages);

        list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
-       WRITE_ONCE(area->mt_nr_free[old_mt], area->mt_nr_free[old_mt] - 1);
-       WRITE_ONCE(area->mt_nr_free[new_mt], area->mt_nr_free[new_mt] + 1);
+       area->mt_nr_free[old_mt]--;
+       area->mt_nr_free[new_mt]++;

        account_freepages(zone, -nr_pages, old_mt);
        account_freepages(zone, nr_pages, new_mt);
@@ -875,8 +874,7 @@ static inline void
__del_page_from_free_list(struct page *page, struct zone *zon
        __ClearPageBuddy(page);
        set_page_private(page, 0);
        area->nr_free--;
-       WRITE_ONCE(area->mt_nr_free[migratetype],
-               area->mt_nr_free[migratetype] - 1);
+       area->mt_nr_free[migratetype]--;

        if (order >= pageblock_order && !is_migrate_isolate(migratetype))
                __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7e1e931eb209..d74004eb8c4d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1599,7 +1599,7 @@ static void pagetypeinfo_showfree_print(struct
seq_file *m,
                        bool overflow = false;

                        /* Keep the same output format for user-space
tools compatibility */
-                       freecount =
READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
+                       freecount =
data_race(zone->free_area[order].mt_nr_free[mtype]);
                        if (freecount >= 100000) {
                                overflow = true;
                                freecount = 100000;

Thanks
Barry