lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4x5v=M0=jYGOqy1rHL9aVg-76OgiE0qQMdEu70FhZcmUg@mail.gmail.com>
Date: Wed, 15 Oct 2025 04:17:44 +0800
From: Barry Song <21cnbao@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Matthew Wilcox <willy@...radead.org>, netdev@...r.kernel.org, linux-mm@...ck.org, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Barry Song <v-songbaohua@...o.com>, Jonathan Corbet <corbet@....net>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, Paolo Abeni <pabeni@...hat.com>, 
	Willem de Bruijn <willemb@...gle.com>, "David S. Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Simon Horman <horms@...nel.org>, Vlastimil Babka <vbabka@...e.cz>, 
	Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>, 
	Brendan Jackman <jackmanb@...gle.com>, Johannes Weiner <hannes@...xchg.org>, Zi Yan <ziy@...dia.com>, 
	Yunsheng Lin <linyunsheng@...wei.com>, Huacai Zhou <zhouhuacai@...o.com>
Subject: Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation

On Tue, Oct 14, 2025 at 6:39 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Oct 14, 2025 at 3:19 AM Barry Song <21cnbao@...il.com> wrote:
> >
> > > >
> > > > >
> > > > > I think you are missing something to control how much memory  can be
> > > > > pushed on each TCP socket ?
> > > > >
> > > > > What is tcp_wmem on your phones ? What about tcp_mem ?
> > > > >
> > > > > Have you looked at /proc/sys/net/ipv4/tcp_notsent_lowat
> > > >
> > > > # cat /proc/sys/net/ipv4/tcp_wmem
> > > > 524288  1048576 6710886
> > >
> > > Ouch. That is insane tcp_wmem[0] .
> > >
> > > Please stick to 4096, or risk OOM of various sorts.
> > >
> > > >
> > > > # cat /proc/sys/net/ipv4/tcp_notsent_lowat
> > > > 4294967295
> > > >
> > > > Any thoughts on these settings?
> > >
> > > Please look at
> > > https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
> > >
> > > tcp_notsent_lowat - UNSIGNED INTEGER
> > > A TCP socket can control the amount of unsent bytes in its write queue,
> > > thanks to TCP_NOTSENT_LOWAT socket option. poll()/select()/epoll()
> > > reports POLLOUT events if the amount of unsent bytes is below a per
> > > socket value, and if the write queue is not full. sendmsg() will
> > > also not add new buffers if the limit is hit.
> > >
> > > This global variable controls the amount of unsent data for
> > > sockets not using TCP_NOTSENT_LOWAT. For these sockets, a change
> > > to the global variable has immediate effect.
> > >
> > >
> > > Setting this sysctl to 2MB can effectively reduce the amount of memory
> > > in TCP write queues by 66 %,
> > > or allow you to increase tcp_wmem[2] so that only flows needing big
> > > BDP can get it.
> >
> > We obtained these settings from our hardware vendors.
>
> Tell them they are wrong.

Well, we checked Qualcomm and MTK, and it seems both set these values
relatively high. In other words, all the AOSP products we examined also
use high values for these settings. Nobody is using tcp_wmem[0]=4096.

We’ll need some time to understand why these are configured this way in
AOSP hardware.

>
> >
> > It might be worth exploring these settings further, but I can’t quite see
> > their connection to high-order allocations, since high-order allocations are
> > kernel macros.
> >
> > #define SKB_FRAG_PAGE_ORDER     get_order(32768)
> > #define PAGE_FRAG_CACHE_MAX_SIZE        __ALIGN_MASK(32768, ~PAGE_MASK)
> > #define PAGE_FRAG_CACHE_MAX_ORDER       get_order(PAGE_FRAG_CACHE_MAX_SIZE)
> >
> > Is there anything I’m missing?
>
> What is your question exactly ? You read these macros just fine. What
> is your point ?

My question is whether these settings influence how often high-order
allocations occur. In other words, would lowering these values make
high-order allocations less frequent? If so, why?
I’m not a network expert, apologies if the question sounds naive.

>
> We had in the past something dynamic that we removed
>
> commit d9b2938aabf757da2d40153489b251d4fc3fdd18
> Author: Eric Dumazet <edumazet@...gle.com>
> Date:   Wed Aug 27 20:49:34 2014 -0700
>
>     net: attempt a single high order allocation

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ