lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpUPvcyxoW9=z4pY6rMfeAJNAbh21km4fUTSredm1rP+0Q@mail.gmail.com>
Date:   Mon, 23 Mar 2020 11:17:49 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Václav Zindulka <vaclav.zindulka@...pnet.cz>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: iproute2: tc deletion freezes whole server

Hello,

On Sun, Mar 22, 2020 at 11:07 AM Václav Zindulka
<vaclav.zindulka@...pnet.cz> wrote:
>
> Hello,
...

> Recently I discovered the existence of perf tool and discovered that
> delay was between tc and kernel. This is perf trace of tc qdisc del
> dev enp1s0f0 root. Notice the 11s delay at the end. (Similar problem
> is during deletion of minor tc class and qdiscs) This was done on
> fresh Debian Stretch installation, but mostly the delay is much
> greater. This problem is not limited to Debian Stretch. It happens
> with Debian Buster and even with Ubuntu 19.10. It happens on kernels
> 4.9, 4.19, 5.1.2, 5.2.3, 5.4.6 as far as I've tested. It is not caused
> by one manufacturer or device driver. We mostly use dual port Intel
> 82599ES cards made by SuperMicro and I've tried kernel drivers as well
> as latest ixgbe driver. Exactly the same problem is with dual port
> Mellanox ConnectX-4 LX, Myricom 10G-PCIE-8B cards too. Whole network
> adapter resets after the deletion of rules.
>
> perf trace tc qdisc del dev enp1s0f0 root

Can you capture a `perf record` for kernel functions too? We
need to know where kernel spent time on during this 11s delay.

>
> When I call this command on ifb interface or RJ45 interface everything
> is done within one second.


Do they have the same tc configuration and same workload?


> My testing setup consists of approx. 18k tc class rules and approx.
> 13k tc qdisc rules and was altered only with different interface name.
> Everything works OK with ifb interfaces and with metallic interfaces.
> I don't know how to diagnose the problem further. It is most likely
> that it will work with regular network cards. All problems begin with
> SFP+ interfaces. I do a lot of dynamic operations and I modify shaping
> tree according to real situation and changes in network so I'm doing
> deletion of tc rules regularly. It is a matter of hours or days before
> the whole server freezes due to tc deletion problem. I have reproducer
> batches for tc ready if anybody will be willing to have a look at this
> issue. I may offer one server which has this problem every time to
> debug and test it. Or at least I would appreciate some advice on how
> to diagnose process of tc deletion further.

Please share you tc configurations (tc -s -d qd show dev ..., tc
-s -d filter show dev...).

Also, it would be great if you can provide a minimal reproducer.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ