lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251014080812.2985-1-21cnbao@gmail.com>
Date: Tue, 14 Oct 2025 16:08:12 +0800
From: Barry Song <21cnbao@...il.com>
To: mhocko@...e.com
Cc: 21cnbao@...il.com,
	alexei.starovoitov@...il.com,
	corbet@....net,
	davem@...emloft.net,
	david@...hat.com,
	edumazet@...gle.com,
	hannes@...xchg.org,
	harry.yoo@...cle.com,
	horms@...nel.org,
	jackmanb@...gle.com,
	kuba@...nel.org,
	kuniyu@...gle.com,
	linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	linyunsheng@...wei.com,
	netdev@...r.kernel.org,
	pabeni@...hat.com,
	roman.gushchin@...ux.dev,
	surenb@...gle.com,
	v-songbaohua@...o.com,
	vbabka@...e.cz,
	willemb@...gle.com,
	willy@...radead.org,
	zhouhuacai@...o.com,
	ziy@...dia.com,
	baolin.wang@...ux.alibaba.com
Subject: Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation

On Tue, Oct 14, 2025 at 3:26 PM Michal Hocko <mhocko@...e.com> wrote:
>
> On Mon 13-10-25 20:30:13, Vlastimil Babka wrote:
> > On 10/13/25 12:16, Barry Song wrote:
> > > From: Barry Song <v-songbaohua@...o.com>
> [...]
> > I wonder if we should either:
> >
> > 1) sacrifice a new __GFP flag specifically for "!allow_spin" case to
> > determine it precisely.
>
> As said in other reply I do not think this is a good fit for this
> specific case as it is all or nothing approach. Soon enough we discover
> that "no effort to reclaim/compact" hurts other usecases. So I do not
> think we need a dedicated flag for this specific case. We need a way to
> tell kswapd/kcompactd how much to try instead.

+Baolin, who may have observed the same issue.

An issue with vmscan is that kcompactd is woken up very late, only after
reclaiming a large number of order-0 pages to satisfy an order-3
application.

static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
{

...
                balanced = pgdat_balanced(pgdat, sc.order, highest_zoneidx);
                if (!balanced && nr_boost_reclaim) {
                        nr_boost_reclaim = 0;
                        goto restart;
                }

                /*
                 * If boosting is not active then only reclaim if there are no
                 * eligible zones. Note that sc.reclaim_idx is not used as
                 * buffer_heads_over_limit may have adjusted it.
                 */
                if (!nr_boost_reclaim && balanced)
                        goto out;
...
                if (kswapd_shrink_node(pgdat, &sc))
                        raise_priority = false;
...

out:

                ...
                /*
                 * As there is now likely space, wakeup kcompact to defragment
                 * pageblocks.
                 */
                wakeup_kcompactd(pgdat, pageblock_order, highest_zoneidx);
}

As pgdat_balanced() needs at least one 3-order pages to return true:

bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
                         int highest_zoneidx, unsigned int alloc_flags,
                         long free_pages)
{
        ...  
        if (free_pages <= min + z->lowmem_reserve[highest_zoneidx])
                return false;

        /* If this is an order-0 request then the watermark is fine */
        if (!order)
                return true;

        /* For a high-order request, check at least one suitable page is free */
        for (o = order; o < NR_PAGE_ORDERS; o++) {
                struct free_area *area = &z->free_area[o];
                int mt;

                if (!area->nr_free)
                        continue;

                for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
                        if (!free_area_empty(area, mt)) 
                                return true;
                }    

#ifdef CONFIG_CMA
                if ((alloc_flags & ALLOC_CMA) &&
                    !free_area_empty(area, MIGRATE_CMA)) {
                        return true;
                }    
#endif
                if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
                    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
                        return true;
                }

}

This appears to be incorrect and will always lead to over-reclamation in order0
to satisfy high-order applications.

I wonder if we should "goto out" earlier to wake up kcompactd when there
is plenty of memory available, even if no order-3 pages exist.

Conceptually, what I mean is:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c80fcae7f2a1..d0e03066bbaa 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7057,9 +7057,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
                 * eligible zones. Note that sc.reclaim_idx is not used as
                 * buffer_heads_over_limit may have adjusted it.
                 */
-               if (!nr_boost_reclaim && balanced)
+               if (!nr_boost_reclaim && (balanced || we_have_plenty_memory_to_compact()))
                        goto out;

                /* Limit the priority of boosting to avoid reclaim writeback */
                if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
                        raise_priority = false;


Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ