[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251014080812.2985-1-21cnbao@gmail.com>
Date: Tue, 14 Oct 2025 16:08:12 +0800
From: Barry Song <21cnbao@...il.com>
To: mhocko@...e.com
Cc: 21cnbao@...il.com,
alexei.starovoitov@...il.com,
corbet@....net,
davem@...emloft.net,
david@...hat.com,
edumazet@...gle.com,
hannes@...xchg.org,
harry.yoo@...cle.com,
horms@...nel.org,
jackmanb@...gle.com,
kuba@...nel.org,
kuniyu@...gle.com,
linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
linyunsheng@...wei.com,
netdev@...r.kernel.org,
pabeni@...hat.com,
roman.gushchin@...ux.dev,
surenb@...gle.com,
v-songbaohua@...o.com,
vbabka@...e.cz,
willemb@...gle.com,
willy@...radead.org,
zhouhuacai@...o.com,
ziy@...dia.com,
baolin.wang@...ux.alibaba.com
Subject: Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation
On Tue, Oct 14, 2025 at 3:26 PM Michal Hocko <mhocko@...e.com> wrote:
>
> On Mon 13-10-25 20:30:13, Vlastimil Babka wrote:
> > On 10/13/25 12:16, Barry Song wrote:
> > > From: Barry Song <v-songbaohua@...o.com>
> [...]
> > I wonder if we should either:
> >
> > 1) sacrifice a new __GFP flag specifically for "!allow_spin" case to
> > determine it precisely.
>
> As said in other reply I do not think this is a good fit for this
> specific case as it is all or nothing approach. Soon enough we discover
> that "no effort to reclaim/compact" hurts other usecases. So I do not
> think we need a dedicated flag for this specific case. We need a way to
> tell kswapd/kcompactd how much to try instead.
+Baolin, who may have observed the same issue.
An issue with vmscan is that kcompactd is woken up very late, only after
reclaiming a large number of order-0 pages to satisfy an order-3
application.
static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
{
...
balanced = pgdat_balanced(pgdat, sc.order, highest_zoneidx);
if (!balanced && nr_boost_reclaim) {
nr_boost_reclaim = 0;
goto restart;
}
/*
* If boosting is not active then only reclaim if there are no
* eligible zones. Note that sc.reclaim_idx is not used as
* buffer_heads_over_limit may have adjusted it.
*/
if (!nr_boost_reclaim && balanced)
goto out;
...
if (kswapd_shrink_node(pgdat, &sc))
raise_priority = false;
...
out:
...
/*
* As there is now likely space, wakeup kcompact to defragment
* pageblocks.
*/
wakeup_kcompactd(pgdat, pageblock_order, highest_zoneidx);
}
As pgdat_balanced() needs at least one 3-order pages to return true:
bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
int highest_zoneidx, unsigned int alloc_flags,
long free_pages)
{
...
if (free_pages <= min + z->lowmem_reserve[highest_zoneidx])
return false;
/* If this is an order-0 request then the watermark is fine */
if (!order)
return true;
/* For a high-order request, check at least one suitable page is free */
for (o = order; o < NR_PAGE_ORDERS; o++) {
struct free_area *area = &z->free_area[o];
int mt;
if (!area->nr_free)
continue;
for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
if (!free_area_empty(area, mt))
return true;
}
#ifdef CONFIG_CMA
if ((alloc_flags & ALLOC_CMA) &&
!free_area_empty(area, MIGRATE_CMA)) {
return true;
}
#endif
if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
!free_area_empty(area, MIGRATE_HIGHATOMIC)) {
return true;
}
}
This appears to be incorrect and will always lead to over-reclamation in order0
to satisfy high-order applications.
I wonder if we should "goto out" earlier to wake up kcompactd when there
is plenty of memory available, even if no order-3 pages exist.
Conceptually, what I mean is:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c80fcae7f2a1..d0e03066bbaa 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7057,9 +7057,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
* eligible zones. Note that sc.reclaim_idx is not used as
* buffer_heads_over_limit may have adjusted it.
*/
- if (!nr_boost_reclaim && balanced)
+ if (!nr_boost_reclaim && (balanced || we_have_plenty_memory_to_compact()))
goto out;
/* Limit the priority of boosting to avoid reclaim writeback */
if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
raise_priority = false;
Thanks
Barry
Powered by blists - more mailing lists