netdev - Re: iproute2: tc deletion freezes whole server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpWbjgT0rEkzd53aJ_z-WwErs3NWHeQZic+Vqn3TvFpA0A@mail.gmail.com>
Date:   Mon, 18 May 2020 11:22:06 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Václav Zindulka <vaclav.zindulka@...pnet.cz>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: iproute2: tc deletion freezes whole server

On Mon, May 18, 2020 at 7:16 AM Václav Zindulka
<vaclav.zindulka@...pnet.cz> wrote:
>
> On Sun, May 17, 2020 at 9:35 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> >
> > On Fri, May 8, 2020 at 6:59 AM Václav Zindulka
> > <vaclav.zindulka@...pnet.cz> wrote:
> > > > > >
> > > > > > I tried to emulate your test case in my VM, here is the script I use:
> > > > > >
> > > > > > ====
> > > > > > ip li set dev dummy0 up
> > > > > > tc qd add dev dummy0 root handle 1: htb default 1
> > > > > > for i in `seq 1 1000`
> > > > > > do
> > > > > >   tc class add dev dummy0 parent 1:0 classid 1:$i htb rate 1mbit ceil 1.5mbit
> > > > > >   tc qd add dev dummy0 parent 1:$i fq_codel
> > > > > > done
> > > > > >
> > > > > > time tc qd del dev dummy0 root
> > > > > > ====
> > > > > >
> > > > > > And this is the result:
> > > > > >
> > > > > >     Before my patch:
> > > > > >      real   0m0.488s
> > > > > >      user   0m0.000s
> > > > > >      sys    0m0.325s
> > > > > >
> > > > > >     After my patch:
> > > > > >      real   0m0.180s
> > > > > >      user   0m0.000s
> > > > > >      sys    0m0.132s
> > > > >
> > > > > My results with your test script.
> > > > >
> > > > > before patch:
> > > > > /usr/bin/time -p tc qdisc del dev enp1s0f0 root
> > > > > real 1.63
> > > > > user 0.00
> > > > > sys 1.63
> > > > >
> > > > > after patch:
> > > > > /usr/bin/time -p tc qdisc del dev enp1s0f0 root
> > > > > real 1.55
> > > > > user 0.00
> > > > > sys 1.54
> > > > >
> > > > > > This is an obvious improvement, so I have no idea why you didn't
> > > > > > catch any difference.
> > > > >
> > > > > We use hfsc instead of htb. I don't know whether it may cause any
> > > > > difference. I can provide you with my test scripts if necessary.
> > > >
> > > > Yeah, you can try to replace the htb with hfsc in my script,
> > > > I didn't spend time to figure out hfsc parameters.
> > >
> > > class add dev dummy0 parent 1:0 classid 1:$i hfsc ls m1 0 d 0 m2
> > > 13107200 ul m1 0 d 0 m2 13107200
> > >
> > > but it behaves the same as htb...
> > >
> > > > My point here is, if I can see the difference with merely 1000
> > > > tc classes, you should see a bigger difference with hundreds
> > > > of thousands classes in your setup. So, I don't know why you
> > > > saw a relatively smaller difference.
> > >
> > > I saw a relatively big difference. It was about 1.5s faster on my huge
> > > setup which is a lot. Yet maybe the problem is caused by something
> >
> > What percentage? IIUC, without patch it took you about 11s, so
> > 1.5s faster means 13% improvement for you?
>
> My whole setup needs 22.17 seconds to delete with an unpatched kernel.
> With your patches applied it is 21.08. So it varies between 1 - 1.5s.
> Improvement is about 5 - 6%.

Good to know.

>
> > > else? I thought about tx/rx queues. RJ45 ports have up to 4 tx and rx
> > > queues. SFP+ interfaces have much higher limits. 8 or even 64 possible
> > > queues. I've tried to increase the number of queues using ethtool from
> > > 4 to 8 and decreased to 2. But there was no difference. It was about
> > > 1.62 - 1.63 with an unpatched kernel and about 1.55 - 1.58 with your
> > > patches applied. I've tried it for ifb and RJ45 interfaces where it
> > > took about 0.02 - 0.03 with an unpatched kernel and 0.05 with your
> > > patches applied, which is strange, but it may be caused by the fact it
> > > was very fast even before.
> >
> > That is odd. In fact, this is highly related to number of TX queues,
> > because the existing code resets the qdisc's once for each TX
> > queue, so the more TX queues you have, the more resets kernel
> > will do, that is the more time it will take.
>
> Can't the problem be caused that reset is done on active and inactive
> queues every time? It would explain why it had no effect in decreasing
> and increasing the number of active queues. Yet it doesn't explain why
> Intel card (82599ES) with 64 possible queues has exactly the same
> problem as Mellanox (ConnectX-4 LX) with 8 possible queues.

Regardless of these queues, the qdisc's should be only reset once,
because all of these queues point to the same instance of root
qdisc in your case.

[...]
> With the attached patch I'm down to 1.7 seconds - more than 90%
> improvement :-) Can you please check it and pass it to proper places?
> According to debugging printk messages it empties only active queues.

You can't change netdev_for_each_tx_queue(), it would certainly at least
break netif_alloc_netdev_queues().

Let me think how to fix this properly, I have some ideas and will provide
you some patch(es) to test soon.

Thanks!