netdev - Re: [PATCH net] net_sched: sch_fq: don't follow the fast path if Tx is behind now

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKBUJ6p56+3TRNB5JAn0bmuRPDWLeOwGmvLh5yjwnDasA@mail.gmail.com>
Date: Fri, 22 Nov 2024 17:44:33 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, davem@...emloft.net, pabeni@...hat.com, 
	jhs@...atatu.com, xiyou.wangcong@...il.com, jiri@...nulli.us
Subject: Re: [PATCH net] net_sched: sch_fq: don't follow the fast path if Tx
 is behind now

On Fri, Nov 22, 2024 at 5:21 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> Recent kernels cause a lot of TCP retransmissions
>
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec  2.24 GBytes  19.2 Gbits/sec  2767    442 KBytes
> [  5]   1.00-2.00   sec  2.23 GBytes  19.1 Gbits/sec  2312    350 KBytes
>                                                       ^^^^
>
> Replacing the qdisc with pfifo makes them go away. It appears that
> a flow may get throttled with a very near unthrottle time.
> Later we may get busy processing Rx and the unthrottling time will
> pass, but we won't service Tx since the core is busy with Rx.
> If Rx sees an ACK and we try to push more data for the throttled flow
> we may fastpath the skb, not realizing that there are already "ready
> to send" packets for this flow sitting in the qdisc.
> At least this is my theory on what happens.
>
> Don't trust the fastpath if we are "behind" according to the projected
> unthrottle time for some flow waiting in the Qdisc.
>
> Qdisc config:
>
> qdisc fq 8001: dev eth0 parent 1234:1 limit 10000p flow_limit 100p \
>   buckets 32768 orphan_mask 1023 bands 3 \
>   priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 \
>   weights 589824 196608 65536 quantum 3028b initial_quantum 15140b \
>   low_rate_threshold 550Kbit \
>   refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
>
> For iperf this change seems to do fine, the reordering is gone.
> The fastpath still gets used most of the time:
>
>   gc 0 highprio 0 fastpath 142614 throttled 418309 latency 19.1us
>  xx_behind 2731
>
> where "xx_behind" counts how many times we hit the new return false.
>
> Fixes: 076433bd78d7 ("net_sched: sch_fq: add fast path for mostly idle qdisc")
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
> CC: jhs@...atatu.com
> CC: xiyou.wangcong@...il.com
> CC: jiri@...nulli.us
> ---
>  net/sched/sch_fq.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> index 19a49af5a9e5..3d932b262159 100644
> --- a/net/sched/sch_fq.c
> +++ b/net/sched/sch_fq.c
> @@ -331,6 +331,12 @@ static bool fq_fastpath_check(const struct Qdisc *sch, struct sk_buff *skb,
>                  */
>                 if (q->internal.qlen >= 8)
>                         return false;
> +
> +               /* Ordering invariants fall apart if some throttled flows
> +                * are ready but we haven't serviced them, yet.
> +                */
> +               if (q->throttled_flows && q->time_next_delayed_flow <= now)
> +                       return false;
>         }

Interesting... I guess we could also call fq_check_throttled() to
refresh a better view of the qdisc state ?

But perhaps your patch is simpler. I guess it could be reduced to

if (q->time_next_delayed_flow <= now + q->offload_horizon)
      return false;

(Note the + q->offload_horizon)

I do not think testing q->throttled_flows is strictly needed :
If 0, then q->time_next_delayed_flow is set to ~0ULL.