netdev - Re: [PATCH net-next 3/4] tcp: add SACK compression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 17 May 2018 08:40:30 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Neal Cardwell <ncardwell@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>
Cc:     David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>,
        Toke Høiland-Jørgensen <toke@...e.dk>,
        Yuchung Cheng <ycheng@...gle.com>,
        Soheil Hassas Yeganeh <soheil@...gle.com>
Subject: Re: [PATCH net-next 3/4] tcp: add SACK compression



On 05/17/2018 08:14 AM, Neal Cardwell wrote:
> On Thu, May 17, 2018 at 8:12 AM Eric Dumazet <edumazet@...gle.com> wrote:
> 
>> When TCP receives an out-of-order packet, it immediately sends
>> a SACK packet, generating network load but also forcing the
>> receiver to send 1-MSS pathological packets, increasing its
>> RTX queue length/depth, and thus processing time.
> 
>> Wifi networks suffer from this aggressive behavior, but generally
>> speaking, all these SACK packets add fuel to the fire when networks
>> are under congestion.
> 
>> This patch adds a high resolution timer and tp->compressed_ack counter.
> 
>> Instead of sending a SACK, we program this timer with a small delay,
>> based on SRTT and capped to 2.5 ms : delay = min ( 5 % of SRTT, 2.5 ms)
> ...
> 
> Very nice. Thanks for implementing this, Eric! I was wondering if the
> constants here might be worth some discussion/elaboration.
> 
>> @@ -5074,6 +5076,7 @@ static inline void tcp_data_snd_check(struct sock
> *sk)
>>   static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
>>   {
>>          struct tcp_sock *tp = tcp_sk(sk);
>> +       unsigned long delay;
> 
>>              /* More than one full frame received... */
>>          if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss
> &&
>> @@ -5085,15 +5088,31 @@ static void __tcp_ack_snd_check(struct sock *sk,
> int ofo_possible)
>>              (tp->rcv_nxt - tp->copied_seq < sk->sk_rcvlowat ||
>>               __tcp_select_window(sk) >= tp->rcv_wnd)) ||
>>              /* We ACK each frame or... */
>> -           tcp_in_quickack_mode(sk) ||
>> -           /* We have out of order data. */
>> -           (ofo_possible && !RB_EMPTY_ROOT(&tp->out_of_order_queue))) {
>> -               /* Then ack it now */
>> +           tcp_in_quickack_mode(sk)) {
>> +send_now:
>>                  tcp_send_ack(sk);
>> -       } else {
>> -               /* Else, send delayed ack. */
>> +               return;
>> +       }
>> +
>> +       if (!ofo_possible || RB_EMPTY_ROOT(&tp->out_of_order_queue)) {
>>                  tcp_send_delayed_ack(sk);
>> +               return;
>>          }
>> +
>> +       if (!tcp_is_sack(tp) || tp->compressed_ack >= 127)
>> +               goto send_now;
>> +       tp->compressed_ack++;
> 
> Is there a particular motivation for the cap of 127? IMHO 127 ACKs is quite
> a few to compress. Experience seems to show that it works well to have one
> GRO ACK for ~64KBytes that triggers a single TSO skb of ~64KBytes. It might
> be nice to try to match those dynamics in this SACK compression case, so it
> might be nice to cap the number of compressed ACKs at something like 44?
> (0xffff / 1448 - 1).  That way for high-speed paths we could try to keep
> the ACK clock going with ACKs for ~64KBytes that trigger a single TSO skb
> of ~64KBytes, no matter whether we are sending SACKs or cumulative ACKs.

127 was chosen because the field is u8, and since skb allocation for the ACK
can fail, we could have cases were the field goes above 127.

Ultimately, I believe a followup patch would add a sysctl, so that we can fine-tune
this, and eventually disable ACK compression if this sysctl is set to 0

> 
>> +
>> +       if (hrtimer_is_queued(&tp->compressed_ack_timer))
>> +               return;
>> +
>> +       /* compress ack timer : 5 % of srtt, but no more than 2.5 ms */
>> +
>> +       delay = min_t(unsigned long, 2500 * NSEC_PER_USEC,
>> +                     tp->rcv_rtt_est.rtt_us * (NSEC_PER_USEC >> 3)/20);
> 
> Any particular motivation for the 2.5ms here? It might be nice to match the
> existing TSO autosizing dynamics and use 1ms here instead of having a
> separate new constant of 2.5ms. Smaller time scales here should lead to
> less burstiness and queue pressure from data packets in the network, and we
> know from experience that the CPU overhead of 1ms chunks is acceptable.

This came from my tests on wifi really :)

I also had the idea to make this threshold adjustable for wifi, like we did for sk_pacing_shift.

(On wifi, we might want to increase the max delay between ACK)

So maybe use 1ms delay, when sk_pacing_shift == 10, but increase it if sk_pacing_shift has been lowered.