[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpUPvcyxoW9=z4pY6rMfeAJNAbh21km4fUTSredm1rP+0Q@mail.gmail.com>
Date: Mon, 23 Mar 2020 11:17:49 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Václav Zindulka <vaclav.zindulka@...pnet.cz>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: iproute2: tc deletion freezes whole server
Hello,
On Sun, Mar 22, 2020 at 11:07 AM Václav Zindulka
<vaclav.zindulka@...pnet.cz> wrote:
>
> Hello,
...
> Recently I discovered the existence of perf tool and discovered that
> delay was between tc and kernel. This is perf trace of tc qdisc del
> dev enp1s0f0 root. Notice the 11s delay at the end. (Similar problem
> is during deletion of minor tc class and qdiscs) This was done on
> fresh Debian Stretch installation, but mostly the delay is much
> greater. This problem is not limited to Debian Stretch. It happens
> with Debian Buster and even with Ubuntu 19.10. It happens on kernels
> 4.9, 4.19, 5.1.2, 5.2.3, 5.4.6 as far as I've tested. It is not caused
> by one manufacturer or device driver. We mostly use dual port Intel
> 82599ES cards made by SuperMicro and I've tried kernel drivers as well
> as latest ixgbe driver. Exactly the same problem is with dual port
> Mellanox ConnectX-4 LX, Myricom 10G-PCIE-8B cards too. Whole network
> adapter resets after the deletion of rules.
>
> perf trace tc qdisc del dev enp1s0f0 root
Can you capture a `perf record` for kernel functions too? We
need to know where kernel spent time on during this 11s delay.
>
> When I call this command on ifb interface or RJ45 interface everything
> is done within one second.
Do they have the same tc configuration and same workload?
> My testing setup consists of approx. 18k tc class rules and approx.
> 13k tc qdisc rules and was altered only with different interface name.
> Everything works OK with ifb interfaces and with metallic interfaces.
> I don't know how to diagnose the problem further. It is most likely
> that it will work with regular network cards. All problems begin with
> SFP+ interfaces. I do a lot of dynamic operations and I modify shaping
> tree according to real situation and changes in network so I'm doing
> deletion of tc rules regularly. It is a matter of hours or days before
> the whole server freezes due to tc deletion problem. I have reproducer
> batches for tc ready if anybody will be willing to have a look at this
> issue. I may offer one server which has this problem every time to
> debug and test it. Or at least I would appreciate some advice on how
> to diagnose process of tc deletion further.
Please share you tc configurations (tc -s -d qd show dev ..., tc
-s -d filter show dev...).
Also, it would be great if you can provide a minimal reproducer.
Thanks.
Powered by blists - more mailing lists