[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+wikOQQrGFXu=L3nKPG62rsBmWer5WpLg5wmBN+RdMqA@mail.gmail.com>
Date: Mon, 13 Oct 2025 11:53:57 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Barry Song <21cnbao@...il.com>
Cc: netdev@...r.kernel.org, linux-mm@...ck.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, Barry Song <v-songbaohua@...o.com>,
Jonathan Corbet <corbet@....net>, Kuniyuki Iwashima <kuniyu@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemb@...gle.com>, "David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Simon Horman <horms@...nel.org>, Vlastimil Babka <vbabka@...e.cz>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Brendan Jackman <jackmanb@...gle.com>, Johannes Weiner <hannes@...xchg.org>, Zi Yan <ziy@...dia.com>,
Yunsheng Lin <linyunsheng@...wei.com>, Huacai Zhou <zhouhuacai@...o.com>
Subject: Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation
On Mon, Oct 13, 2025 at 3:16 AM Barry Song <21cnbao@...il.com> wrote:
>
> From: Barry Song <v-songbaohua@...o.com>
>
> On phones, we have observed significant phone heating when running apps
> with high network bandwidth. This is caused by the network stack frequently
> waking kswapd for order-3 allocations. As a result, memory reclamation becomes
> constantly active, even though plenty of memory is still available for network
> allocations which can fall back to order-0.
>
> Commit ce27ec60648d ("net: add high_order_alloc_disable sysctl/static key")
> introduced high_order_alloc_disable for the transmit (TX) path
> (skb_page_frag_refill()) to mitigate some memory reclamation issues,
> allowing the TX path to fall back to order-0 immediately, while leaving the
> receive (RX) path (__page_frag_cache_refill()) unaffected. Users are
> generally unaware of the sysctl and cannot easily adjust it for specific use
> cases. Enabling high_order_alloc_disable also completely disables the
> benefit of order-3 allocations. Additionally, the sysctl does not apply to the
> RX path.
>
> An alternative approach is to disable kswapd for these frequent
> allocations and provide best-effort order-3 service for both TX and RX paths,
> while removing the sysctl entirely.
>
>
...
> Signed-off-by: Barry Song <v-songbaohua@...o.com>
> ---
> Documentation/admin-guide/sysctl/net.rst | 12 ------------
> include/net/sock.h | 1 -
> mm/page_frag_cache.c | 2 +-
> net/core/sock.c | 8 ++------
> net/core/sysctl_net_core.c | 7 -------
> 5 files changed, 3 insertions(+), 27 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
> index 2ef50828aff1..b903bbae239c 100644
> --- a/Documentation/admin-guide/sysctl/net.rst
> +++ b/Documentation/admin-guide/sysctl/net.rst
> @@ -415,18 +415,6 @@ GRO has decided not to coalesce, it is placed on a per-NAPI list. This
> list is then passed to the stack when the number of segments reaches the
> gro_normal_batch limit.
>
> -high_order_alloc_disable
> -------------------------
> -
> -By default the allocator for page frags tries to use high order pages (order-3
> -on x86). While the default behavior gives good results in most cases, some users
> -might have hit a contention in page allocations/freeing. This was especially
> -true on older kernels (< 5.14) when high-order pages were not stored on per-cpu
> -lists. This allows to opt-in for order-0 allocation instead but is now mostly of
> -historical importance.
> -
The sysctl is quite useful for testing purposes, say on a freshly
booted host, with plenty of free memory.
Also, having order-3 pages if possible is quite important for IOMM use cases.
Perhaps kswapd should have some kind of heuristic to not start if a
recent run has already happened.
I am guessing phones do not need to send 1.6 Tbit per second on
network devices (yet),
an option could be to disable it in your boot scripts.
Powered by blists - more mailing lists