netdev - Re: iproute2: tc deletion freezes whole server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpXR+MQHaR-ou6rR_NAz-4XhAWiLuSEYvvpVXyWqHBnc-w@mail.gmail.com>
Date:   Sun, 17 May 2020 12:35:44 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Václav Zindulka <vaclav.zindulka@...pnet.cz>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: iproute2: tc deletion freezes whole server

On Fri, May 8, 2020 at 6:59 AM Václav Zindulka
<vaclav.zindulka@...pnet.cz> wrote:
>
> On Thu, May 7, 2020 at 8:52 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> >
> > On Tue, May 5, 2020 at 1:46 AM Václav Zindulka
> > <vaclav.zindulka@...pnet.cz> wrote:
> > >
> > > On Mon, May 4, 2020 at 7:46 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > > >
> > > > Sorry for the delay. I lost connection to my dev machine, I am trying
> > > > to setup this on my own laptop.
> > >
> > > Sorry to hear that. I will gladly give you access to my testing
> > > machine where all this nasty stuff happens every time so you can test
> > > it in place. You can try everything there and have online results. I
> > > can give you access even to the IPMI console so you can switch the
> > > kernel during boot easily. I didn't notice this problem until the time
> > > of deployment. My prior testing machines were with metallic ethernet
> > > ports only so I didn't know about those problems earlier.
> >
> > Thanks for the offer! No worries, I setup a testing VM on my laptop.
>
> OK
>
> > > >
> > > > I tried to emulate your test case in my VM, here is the script I use:
> > > >
> > > > ====
> > > > ip li set dev dummy0 up
> > > > tc qd add dev dummy0 root handle 1: htb default 1
> > > > for i in `seq 1 1000`
> > > > do
> > > >   tc class add dev dummy0 parent 1:0 classid 1:$i htb rate 1mbit ceil 1.5mbit
> > > >   tc qd add dev dummy0 parent 1:$i fq_codel
> > > > done
> > > >
> > > > time tc qd del dev dummy0 root
> > > > ====
> > > >
> > > > And this is the result:
> > > >
> > > >     Before my patch:
> > > >      real   0m0.488s
> > > >      user   0m0.000s
> > > >      sys    0m0.325s
> > > >
> > > >     After my patch:
> > > >      real   0m0.180s
> > > >      user   0m0.000s
> > > >      sys    0m0.132s
> > >
> > > My results with your test script.
> > >
> > > before patch:
> > > /usr/bin/time -p tc qdisc del dev enp1s0f0 root
> > > real 1.63
> > > user 0.00
> > > sys 1.63
> > >
> > > after patch:
> > > /usr/bin/time -p tc qdisc del dev enp1s0f0 root
> > > real 1.55
> > > user 0.00
> > > sys 1.54
> > >
> > > > This is an obvious improvement, so I have no idea why you didn't
> > > > catch any difference.
> > >
> > > We use hfsc instead of htb. I don't know whether it may cause any
> > > difference. I can provide you with my test scripts if necessary.
> >
> > Yeah, you can try to replace the htb with hfsc in my script,
> > I didn't spend time to figure out hfsc parameters.
>
> class add dev dummy0 parent 1:0 classid 1:$i hfsc ls m1 0 d 0 m2
> 13107200 ul m1 0 d 0 m2 13107200
>
> but it behaves the same as htb...
>
> > My point here is, if I can see the difference with merely 1000
> > tc classes, you should see a bigger difference with hundreds
> > of thousands classes in your setup. So, I don't know why you
> > saw a relatively smaller difference.
>
> I saw a relatively big difference. It was about 1.5s faster on my huge
> setup which is a lot. Yet maybe the problem is caused by something

What percentage? IIUC, without patch it took you about 11s, so
1.5s faster means 13% improvement for you?


> else? I thought about tx/rx queues. RJ45 ports have up to 4 tx and rx
> queues. SFP+ interfaces have much higher limits. 8 or even 64 possible
> queues. I've tried to increase the number of queues using ethtool from
> 4 to 8 and decreased to 2. But there was no difference. It was about
> 1.62 - 1.63 with an unpatched kernel and about 1.55 - 1.58 with your
> patches applied. I've tried it for ifb and RJ45 interfaces where it
> took about 0.02 - 0.03 with an unpatched kernel and 0.05 with your
> patches applied, which is strange, but it may be caused by the fact it
> was very fast even before.

That is odd. In fact, this is highly related to number of TX queues,
because the existing code resets the qdisc's once for each TX
queue, so the more TX queues you have, the more resets kernel
will do, that is the more time it will take.

I plan to address this later on top of the existing patches.

Thanks.