netdev - Re: [PATCH net-next 7/8] tcp: stronger sk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f1829d0-7d79-45bc-9006-65c4e3449a5e@proxmox.com>
Date: Fri, 19 Dec 2025 11:00:01 +0100
From: Christian Ebner <c.ebner@...xmox.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski
 <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Neal Cardwell <ncardwell@...gle.com>, Simon Horman <horms@...nel.org>,
 Kuniyuki Iwashima <kuniyu@...gle.com>, Willem de Bruijn
 <willemb@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com,
 lkolbe@...iuswillert.com
Subject: Re: [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks

On 12/19/25 9:45 AM, Eric Dumazet wrote:
> On Fri, Dec 19, 2025 at 9:23 AM Eric Dumazet <edumazet@...gle.com> wrote:
>>
>> On Thu, Dec 18, 2025 at 3:58 PM Christian Ebner <c.ebner@...xmox.com> wrote:
>>>
>>> On 12/18/25 2:19 PM, Eric Dumazet wrote:
>>>> On Thu, Dec 18, 2025 at 1:28 PM Christian Ebner <c.ebner@...xmox.com> wrote:
>>>>>
>>>>> Hi Eric,
>>>>>
>>>>> thank you for your reply!
>>>>>
>>>>> On 12/18/25 11:10 AM, Eric Dumazet wrote:
>>>>>> Can you give us (on receive side) : cat /proc/sys/net/ipv4/tcp_rmem
>>>>>
>>>>> Affected users report they have the respective kernels defaults set, so:
>>>>> - "4096 131072 6291456"  for v.617 builds
>>>>> - "4096 131072 33554432" with the bumped max value of 32M for v6.18 builds
>>>>>
>>>>>> It seems your application is enforcing a small SO_RCVBUF ?
>>>>>
>>>>> No, we can exclude that since the output of `ss -tim` show the default
>>>>> buffer size after connection being established and growing up to the max
>>>>> value during traffic (backups being performed).
>>>>>
>>>>
>>>> The trace you provided seems to show a very different picture ?
>>>>
>>>> [::ffff:10.xx.xx.aa]:8007
>>>>          [::ffff:10.xx.xx.bb]:55554
>>>>             skmem:(r0,rb7488,t0,tb332800,f0,w0,o0,bl0,d20) cubic
>>>> wscale:10,10 rto:201 rtt:0.085/0.015 ato:40 mss:8948 pmtu:9000
>>>> rcvmss:7168 advmss:8948 cwnd:10 bytes_sent:937478 bytes_acked:937478
>>>> bytes_received:1295747055 segs_out:301010 segs_in:162410
>>>> data_segs_out:1035 data_segs_in:161588 send 8.42Gbps lastsnd:3308
>>>> lastrcv:191 lastack:191 pacing_rate 16.7Gbps delivery_rate 2.74Gbps
>>>> delivered:1036 app_limited busy:437ms rcv_rtt:207.551 rcv_space:96242
>>>> rcv_ssthresh:903417 minrtt:0.049 rcv_ooopack:23 snd_wnd:142336 rcv_wnd:7168
>>>>
>>>> rb7488 would suggest the application has played with a very small SO_RCVBUF,
>>>> or some memory allocation constraint (memcg ?)
>>>
>>> Thanks for the hint were to look, however we checked that the process is
>>> not memory constrained and the host has no memory pressure.
>>>
>>> Also `strace -f -e socket,setsockopt -p $(pidof proxmox-backup-proxy)`
>>> shows no syscalls which would change the socket buffer size (though this
>>> still needs to be double checked by affected users for completeness).
>>>
>>> Further, the stalls most often happen mid transfer, starting with the
>>> expected throughput and even might recover from the stall after some
>>> time, continue at regular speed again.
>>>
>>>
>>> Status update for v6.18
>>> -----------------------
>>>
>>> In the meantime, a user reported 2 stale connections with running kernel
>>> 6.18+416dd649f3aa
>>>
>>> The tcpdump pattern looks slightly different, here we got repeating
>>> sequences of:
>>> ```
>>> 224     5.407981        10.xx.xx.bb     10.xx.xx.aa     TCP     4162    40068 → 8007 [PSH, ACK]
>>> Seq=106497 Ack=1 Win=3121 Len=4096 TSval=3198115973 TSecr=3048094015
>>> 225     5.408064        10.xx.xx.aa     10.xx.xx.bb     TCP     66      8007 → 40068 [ACK] Seq=1
>>> Ack=110593 Win=4 Len=0 TSval=3048094223 TSecr=3198115973
>>> ```
>>>
>>> The perf trace for `tcp:tcp_rcvbuf_grow` came back empty while in stale
>>> state, tracing with:
>>> ```
>>> perf record -a -e tcp:tcp_rcv_space_adjust,tcp:tcp_rcvbuf_grow
>>> perf script
>>> ```
>>> produced some output as shown below, so it seems that tcp_rcvbuf_grow()
>>> is never called in that case, while tcp_rcv_space_adjust() is.
>>
>> Autotuning is not enabled for your case, somehow the application is
>> not behaving as expected,

Is there a way for us to check if autotuning is enabled for the TCP 
connection at this point in time? Some tracepoint to identify it being 
deactivated?

>> so maybe you have to change tcp_rmem[2] if a driver is allocating
>> order-2 pages for the 9K frames.

Same here, is there a way for us to check this? Note however that we 
could not identify a specific NIC/driver to cause the behavior, it 
appears for various vendor models.

> 
> I meant to say : change tcp_rmem[1]
> 
> echo "4096 262144 33554432" >/proc/sys/net/ipv4/tcp_rmem

Okay, thanks for the suggestion, let me get back to you with results if 
this changes anything.


>> You have not given what  was on the sender side (linux or other stack ?)

Clients are all Linux hosts, running kernel versions 6.8, 6.14 or 6.17. 
No other TCP stacks.

Best regards,
Christian Ebner