lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c0e3abb4-8f04-84bc-e480-9edbcc652f6b@gmail.com>
Date:   Wed, 30 Jun 2021 15:11:00 +0200
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     mingkun bian <bianmingkun@...il.com>, netdev@...r.kernel.org
Subject: Re: [ISSUE] EDT will lead ca_rtt to 0 when different tx queue have
 different qdisc



On 6/30/21 10:42 AM, mingkun bian wrote:
> Hi,
> 
> I found a problem that ca_rtt have a small chance to become 0 when
> using EDT, then find that it is caused by different tx queue which
> have different qdisc as following:
> 
> The case may be caused by my operation of the network card(ethtool -L
> ethx combined 48)
> 
>     1. Network card original num_tx_queues is 64, real_num_tx_queues
> is 24, so in "mq_init" and "mq_attach", only the first 24 queues are
> set by default qdisc(fq),
> and the last 40 queues are set to  pfifo_fast_ops.
> 
>     2. After the system starts, I exec "ethtool -L ethx combined 48"
> to make the tx/rx queue to 48, but it does not modify qdisc's
> configuration,
> at this time for bbr, bbr will use fq when "
> __dev_queue_xmit->netdev_pick_tx" select  the first 24 queues, and bbr
> will use tcp stack's timer(qdisc is  pfifo_fast_ops) when   "
> __dev_queue_xmit->netdev_pick_tx" select  the last24 queues,
> and in this case, bbr works normally.
> 
>     3. The wrong scenario is:
>     a. tcp select one of  the first 24 tx queues to send, then sch_fq
> change sk->sk_pacing_status from SK_PACING_NEEDED to SK_PACING_FQ,
> then tcp will use fq to send.
>     b. after a while,  not sure for some reason, __dev_queue_xmit->
> netdev_pick_tx select the last  24 queues which qdisc is
> pfifo_fast_ops, then qdisc send this skb immediately(no pacing), then
> ca_rtt = curtime - skb->timestamp_ns, skb->timestamp_ns may be bigger
> than curtime.
> 
> 
> Why does not get_default_qdisc_ops return all queues to the default qdisc?

Some "git blame" can help to point to :

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f27cde313d72d6b44a73ba89c8b2c6a99c628cf


1) Some devices have awfully big max TX queues ( @num_tx_queues )

2) Some qdisc are using a lot of memory (for example fq_codel)

pfifo_fast in the other hand uses almost no memory.

We expect admins (or tools) to fully reconfigure qdiscs and all device tuning
(RFS/RPS/XPS... if needed) after a device reconfiguration (especially ethtool -L)

This is especially true as complex qdiscs offer a lot of parameters,
and the 'automatic qdisc selection at queue instantiation' can not change any of them.

SK_PACING_FQ can not be magically unset, so if you are using FQ, you need to make
sure that all queue leaves also use FQ, or alive flows depending on pacing
might misbehave.

Alternatives :
1) You could use "ss -tK" to kill all TCP flows after ethtool -L.
2) Not use FQ, and rely on TCP fallback to 'internal' pacing.

> 
> get_default_qdisc_ops(const struct net_device *dev, int ntx)
> {
> return ntx < dev->real_num_tx_queues ?
> default_qdisc_ops : &pfifo_fast_ops;
> }
> 
> Thanks.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ