netdev - Re: Potential bug in linux TCP pacing implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLDFvTZP05Jhf5LDrmAsoDQ_w9qkjOmb5s0pr4-Xh+w3g@mail.gmail.com>
Date: Tue, 14 Nov 2023 21:57:49 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Anup Agarwal <anupa@...rew.cmu.edu>
Cc: netdev@...r.kernel.org
Subject: Re: Potential bug in linux TCP pacing implementation

On Tue, Nov 14, 2023 at 9:39 PM Anup Agarwal <anupa@...rew.cmu.edu> wrote:
>
> Thanks for your response.
>
> Yeah, I think for the currently deployed CCAs, the current pacing
> implementation works fine.
>
> I wanted to clarify, the issue is caused due to temporal change in
> sk_pacing_rate and is independent of pkt sizes or network parameters
> (bandwidth, rtt, etc.). If the sk_pacing_rate goes from r1=0.1 pkt per
> ms (~1.2 Mbps for ~1500B MTU) to r2=10 pkts per ms (~120 Mbps), then
> opportunity to send 99 pkts (=(r2/r1)-1) is missed. This is because
> tcp_wstamp_ns was computed as =10ms using r1, even though
> sk_pacing_rate changed to r2 (immediately after tcp_wstamp_ns
> computation) and a pkt could have been sent at 0.1ms.
>
> The ratio of the new and old rate matters, not the pkt sizes, or other
> network params. Typical CCAs perhaps only change rate by ~2 times so
> only 1 pkt (=r2/r1-1 = 2-1) worth of sending opportunity is lost. This
> is why I guess the issue has not been observed in practice.
>
> Yeah I did see there is an option to specify "skb_mstamp_ns", that
> might allow CCAs to enforce rates better. I don't know how easy or
> difficult it would be for CCAs to set skb_mstamp_ns. Because CCAs may
> not look at individual skbuffs and also given tcp_congestion_ops only
> has callbacks on ACK events and not pkt send events. I guess BPFs are
> to be used? (https://netdevconf.info//0x14/pub/papers/55/0x14-paper55-talk-paper.pdf)
>
> Also, to clarify, the reason for the conscious choice is that the fix
> would require more state in TCP socket? Or are there more reasons, any
> pointers? I imagine, for the fix, the state would increase by ~2-3 u64
> values, e.g., credits in units of bytes, the time the credits was
> updated, and the time the credits expire). Is this too much? Or will
> the fix require more state than this?

It is too much, yes, and not needed currently.