[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wUQdQyB_3y0Buf3uG34hvgpMAP3qHHwJM3=R01RJOuvw@mail.gmail.com>
Date: Sat, 29 Nov 2025 15:55:19 +0800
From: Barry Song <21cnbao@...il.com>
To: zhongjinji <zhongjinji@...or.com>
Cc: zhanghongru06@...il.com, Liam.Howlett@...cle.com,
akpm@...ux-foundation.org, axelrasmussen@...gle.com, david@...nel.org,
hannes@...xchg.org, jackmanb@...gle.com, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, lorenzo.stoakes@...cle.com, mhocko@...e.com,
rppt@...nel.org, surenb@...gle.com, vbabka@...e.cz, weixugc@...gle.com,
yuanchu@...gle.com, zhanghongru@...omi.com, ziy@...dia.com
Subject: Re: [PATCH 2/3] mm/vmstat: get fragmentation statistics from
per-migragetype count
On Sat, Nov 29, 2025 at 8:00 AM Barry Song <21cnbao@...il.com> wrote:
>
> > > if (order >= pageblock_order && !is_migrate_isolate(migratetype))
> > > __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
> > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > index bb09c032eecf..9334bbbe1e16 100644
> > > --- a/mm/vmstat.c
> > > +++ b/mm/vmstat.c
> > > @@ -1590,32 +1590,16 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
> > > zone->name,
> > > migratetype_names[mtype]);
> > > for (order = 0; order < NR_PAGE_ORDERS; ++order) {
> > > - unsigned long freecount = 0;
> > > - struct free_area *area;
> > > - struct list_head *curr;
> > > + unsigned long freecount;
> > > bool overflow = false;
> > >
> > > - area = &(zone->free_area[order]);
> > > -
> > > - list_for_each(curr, &area->free_list[mtype]) {
> > > - /*
> > > - * Cap the free_list iteration because it might
> > > - * be really large and we are under a spinlock
> > > - * so a long time spent here could trigger a
> > > - * hard lockup detector. Anyway this is a
> > > - * debugging tool so knowing there is a handful
> > > - * of pages of this order should be more than
> > > - * sufficient.
> > > - */
> > > - if (++freecount >= 100000) {
> > > - overflow = true;
> > > - break;
> > > - }
> > > + /* Keep the same output format for user-space tools compatibility */
> > > + freecount = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
> >
> > I think it might be better for using an array of size NR_PAGE_ORDERS to store
> > the free count for each order. Like the code below.
>
> Right. If we want the freecount to accurately reflect the current system
> state, we still need to take the zone lock.
>
> Multiple independent WRITE_ONCE and READ_ONCE operations do not guarantee
> correctness. They may ensure single-copy atomicity per access, but not for the
> overall result.
On second thought, the original code releases and re-acquires the spinlock
for each order, so cross-variable consistency may not be a real issue.
Adding data_race() to silence KCSAN warnings should be sufficient?
I mean something like the following.
@@ -843,8 +842,8 @@ static inline void move_to_free_list(struct page
*page, struct zone *zone,
get_pageblock_migratetype(page), old_mt, nr_pages);
list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
- WRITE_ONCE(area->mt_nr_free[old_mt], area->mt_nr_free[old_mt] - 1);
- WRITE_ONCE(area->mt_nr_free[new_mt], area->mt_nr_free[new_mt] + 1);
+ area->mt_nr_free[old_mt]--;
+ area->mt_nr_free[new_mt]++;
account_freepages(zone, -nr_pages, old_mt);
account_freepages(zone, nr_pages, new_mt);
@@ -875,8 +874,7 @@ static inline void
__del_page_from_free_list(struct page *page, struct zone *zon
__ClearPageBuddy(page);
set_page_private(page, 0);
area->nr_free--;
- WRITE_ONCE(area->mt_nr_free[migratetype],
- area->mt_nr_free[migratetype] - 1);
+ area->mt_nr_free[migratetype]--;
if (order >= pageblock_order && !is_migrate_isolate(migratetype))
__mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7e1e931eb209..d74004eb8c4d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1599,7 +1599,7 @@ static void pagetypeinfo_showfree_print(struct
seq_file *m,
bool overflow = false;
/* Keep the same output format for user-space
tools compatibility */
- freecount =
READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
+ freecount =
data_race(zone->free_area[order].mt_nr_free[mtype]);
if (freecount >= 100000) {
overflow = true;
freecount = 100000;
Thanks
Barry
Powered by blists - more mailing lists