netdev - Re: iproute2: tc deletion freezes whole server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpWaK9t7patdFaS_BCdckM-nuocv7m1eiGwbO-jdLVNBMw@mail.gmail.com>
Date:   Tue, 24 Mar 2020 15:57:44 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Václav Zindulka <vaclav.zindulka@...pnet.cz>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: iproute2: tc deletion freezes whole server

On Tue, Mar 24, 2020 at 9:27 AM Václav Zindulka
<vaclav.zindulka@...pnet.cz> wrote:
>
> Hello,
>
> Thank you for the reply!
>
> On Mon, Mar 23, 2020 at 7:18 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > >
> > > perf trace tc qdisc del dev enp1s0f0 root
> >
> > Can you capture a `perf record` for kernel functions too? We
> > need to know where kernel spent time on during this 11s delay.
>
> See perd.data.{ifb0, enp1s0f0} here
> https://github.com/zvalcav/tc-kernel/tree/master/20200324. I hope it
> is the output you wanted. If you need anything else, let me know.

Hm, my bad, please also run `perf report -g` after you record them,
we need the text output with stack traces.

Also, do you have a complete kernel log too? If your network is completely
down, you need some serial console to capture it, kdump could also help
if you know how to setup it. The kernel log usually has some indication
for hangs, for example if we have too much to do with RTNL lock held,
kernel would complain some other tasks hung on waiting for RTNL.

>
> > > When I call this command on ifb interface or RJ45 interface everything
> > > is done within one second.
> >
> >
> > Do they have the same tc configuration and same workload?
>
> Yes, both reproducers are exactly the same, interfaces are configured
> in a similar way. I have the most of the offloading turned off for
> physical interfaces. Yet metallic interfaces don't cause that big
> delay and SFP+ dual port cards do, yet not all of them. Only
> difference in reproducers is the interface name. See git repository
> above, tc-interface_name-upload/download.txt files. I have altered my
> whole setup in daemon I'm working on to change interfaces used. The
> only difference is the existence of ingress tc filter rules to
> redirect traffic to ifb interfaces in production setup. I don't use tc
> filter classification in current setup. I use nftables' ability to
> classify traffic. There is no traffic on interfaces except ssh
> session. It behaves similar way with and without traffic.

This rules out slow path vs. fast path scenario. So, the problem here
is probably there are just too many TC classes and filters to destroy.

Does this also mean it is 100% reproducible when you have the same
number of classes and filters?


>
> > > My testing setup consists of approx. 18k tc class rules and approx.
> > > 13k tc qdisc rules and was altered only with different interface name....
> >
> > Please share you tc configurations (tc -s -d qd show dev ..., tc
> > -s -d filter show dev...).
>
> I've placed whole reproducers into repository. Do you need exports of rules too?
>
> > Also, it would be great if you can provide a minimal reproducer.
>
> I'm afraid that minor reproducer won't cause the problem. This was
> happening mostly on servers with large tc rule setups. I was trying to
> create small reproducer for nftables developer many times without
> success. I can try to create reproducer as small as possible, but it
> may still consist of thousands of rules.

Yeah, this problem is probably TC specific, as we clean up from
the top qdisc down to each class and each filter.

Can you try to reproduce the number of TC classes, for example,
down to half, to see if the problem is gone? This could confirm
whether it is related to the number of TC classes/filters.

Thanks!