netdev - Re: [ISSUE] EDT will lead ca_rtt to 0 when different tx queue have different qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <c0e3abb4-8f04-84bc-e480-9edbcc652f6b@gmail.com>
Date:   Wed, 30 Jun 2021 15:11:00 +0200
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     mingkun bian <bianmingkun@...il.com>, netdev@...r.kernel.org
Subject: Re: [ISSUE] EDT will lead ca_rtt to 0 when different tx queue have
 different qdisc



On 6/30/21 10:42 AM, mingkun bian wrote:
> Hi,
> 
> I found a problem that ca_rtt have a small chance to become 0 when
> using EDT, then find that it is caused by different tx queue which
> have different qdisc as following:
> 
> The case may be caused by my operation of the network card(ethtool -L
> ethx combined 48)
> 
>     1. Network card original num_tx_queues is 64, real_num_tx_queues
> is 24, so in "mq_init" and "mq_attach", only the first 24 queues are
> set by default qdisc(fq),
> and the last 40 queues are set to  pfifo_fast_ops.
> 
>     2. After the system starts, I exec "ethtool -L ethx combined 48"
> to make the tx/rx queue to 48, but it does not modify qdisc's
> configuration,
> at this time for bbr, bbr will use fq when "
> __dev_queue_xmit->netdev_pick_tx" select  the first 24 queues, and bbr
> will use tcp stack's timer(qdisc is  pfifo_fast_ops) when   "
> __dev_queue_xmit->netdev_pick_tx" select  the last24 queues,
> and in this case, bbr works normally.
> 
>     3. The wrong scenario is:
>     a. tcp select one of  the first 24 tx queues to send, then sch_fq
> change sk->sk_pacing_status from SK_PACING_NEEDED to SK_PACING_FQ,
> then tcp will use fq to send.
>     b. after a while,  not sure for some reason， __dev_queue_xmit->
> netdev_pick_tx select the last  24 queues which qdisc is
> pfifo_fast_ops, then qdisc send this skb immediately(no pacing), then
> ca_rtt = curtime - skb->timestamp_ns, skb->timestamp_ns may be bigger
> than curtime.
> 
> 
> Why does not get_default_qdisc_ops return all queues to the default qdisc?

Some "git blame" can help to point to :

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f27cde313d72d6b44a73ba89c8b2c6a99c628cf


1) Some devices have awfully big max TX queues ( @num_tx_queues )

2) Some qdisc are using a lot of memory (for example fq_codel)

pfifo_fast in the other hand uses almost no memory.

We expect admins (or tools) to fully reconfigure qdiscs and all device tuning
(RFS/RPS/XPS... if needed) after a device reconfiguration (especially ethtool -L)

This is especially true as complex qdiscs offer a lot of parameters,
and the 'automatic qdisc selection at queue instantiation' can not change any of them.

SK_PACING_FQ can not be magically unset, so if you are using FQ, you need to make
sure that all queue leaves also use FQ, or alive flows depending on pacing
might misbehave.

Alternatives :
1) You could use "ss -tK" to kill all TCP flows after ethtool -L.
2) Not use FQ, and rely on TCP fallback to 'internal' pacing.

> 
> get_default_qdisc_ops(const struct net_device *dev, int ntx)
> {
> return ntx < dev->real_num_tx_queues ?
> default_qdisc_ops : &pfifo_fast_ops;
> }
> 
> Thanks.
>