lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 27 Mar 2020 11:35:56 +0100
From:   Václav Zindulka <vaclav.zindulka@...pnet.cz>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: iproute2: tc deletion freezes whole server

On Thu, Mar 26, 2020 at 6:38 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
>
>Are the stack traces captured by perf any different with unpatched?

I've compared initial perf reports. There is difference in call stack
between sfp+ interfaces vs ifb and metallic interfaces. While ifb and
metallic interfaces have normal behavior on sfp+ interfaces the call
stack is different. I've added perf record of eno1 metallic interface
here https://github.com/zvalcav/tc-kernel/tree/master/20200325/Intel-82599ES
so you can compare the behavior of all of them. I've also added perf
report of patched kernel.

--100.00%--entry_SYSCALL_64_after_hwframe
          do_syscall_64
          __sys_sendmsg
          ___sys_sendmsg
          ____sys_sendmsg
          sock_sendmsg
          netlink_sendmsg
          netlink_unicast
          netlink_rcv_skb
                                   <-- call stack is similar to other
interfaces
          |
           --99.98%--rtnetlink_rcv_msg
                     tc_get_qdisc
                     qdisc_graft
                                      <-- call stack splits in this
function - dev_deactivate is called only on sfp+ interfaces
                     |
                     |--98.36%--dev_deactivate
                              <-- 98% calls of fq_codel_reset()
function are done here. These must be the excessive calls - see below.
                     |          dev_deactivate_many
                     |          |
                     |          |--49.28%--dev_deactivate_queue.constprop.57
                     |          |          |
                     |          |           --49.27%--qdisc_reset
                     |          |                     hfsc_reset_qdisc
                     |          |                     |
                     |          |                      --48.69%--qdisc_reset
                     |          |                                |
                     |          |
--47.23%--fq_codel_reset             <-- half of excessive function
calls come from here
                     |          |                                           |
                     |          |
     |--3.20%--codel_vars_init
                     |          |                                           |
                     |          |
     |--1.64%--rtnl_kfree_skbs
                     |          |                                           |
                     |          |
      --0.81%--memset_erms
                     |          |
                     |           --49.08%--qdisc_reset
                     |                     hfsc_reset_qdisc
                     |                     |
                     |                      --48.53%--qdisc_reset
                     |                                |
                     |
--47.09%--fq_codel_reset                       <-- other half of
excessive function calls come from here
                     |                                           |
                     |
|--2.90%--codel_vars_init
                     |                                           |
                     |
|--1.61%--rtnl_kfree_skbs
                     |                                           |
                     |
--0.82%--memset_erms
                     |
                      --1.62%--qdisc_destroy
                               <-- here are remaining, (I suppose)
regular calls. Call stack is similar to other interfaces here.
                                |
                                |--0.86%--hfsc_destroy_qdisc
                                |          |
                                |           --0.82%--hfsc_destroy_class
                                |                     |
                                |                      --0.81%--qdisc_destroy
                                |                                |
                                |
--0.72%--fq_codel_reset
                                |
                                 --0.77%--hfsc_reset_qdisc
                                           |
                                            --0.75%--qdisc_reset
                                                      |
                                                       --0.70%--fq_codel_reset

> On Thu, Mar 26, 2020 at 10:07 AM Cong Wang <xiyou.wangcong@...il.com> wrote:
> >
> > On Thu, Mar 26, 2020 at 7:24 AM Václav Zindulka
> > <vaclav.zindulka@...pnet.cz> wrote:
> > >
> > > > On Wed, Mar 25, 2020 at 6:43 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > > > > Are you able to test an experimental patch attached in this email?
> > > >
> > > > Sure. I'll compile new kernel tomorrow. Thank you for quick patch.
> > > > I'll let you know as soon as I have anything.
> > >
> > > I've compiled kernel with the patch you provided and tested it.
> > > However the problem is not solved. It behaves exactly the same way.
> > > I'm trying to put some printk into the fq_codel_reset() to test it
> > > more.
> >
> > Are the stack traces captured by perf any different with unpatched?
>
> Wait, it seems my assumption of refcnt==0 is wrong even for deletion,
> in qdisc_graft() the last refcnt is put after dev_deactivate()...

Your assumption is not totally wrong. I have added some printks into
fq_codel_reset() function. Final passes during deletion are processed
in the if condition you added in the patch - 13706. Yet the rest and
most of them go through regular routine - 1768074. 1024 is value of i
in for loop.

But the most significant problem is that fq_codel_reset() function is
processed too many times. According to kern.log it was being processed
about 77 000 times per second for almost 24 seconds. It was over 1.7
million passes during rules deletion.

Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791176] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791184] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791192] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791199] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791207] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791214] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791221] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791228] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791236] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791243] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791251] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791258] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791266] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791273] Function
fq_codel_reset - regular exit - passes: 1024

Here is another 1.7 million of lines... Function doesn't take much
time according to timestamp but it is being called way too many times.

Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017811] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017823] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017836] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017849] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017862] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017874] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017886] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 21:05:54 shaperd-prelouc1-dev kernel: [  134.017899] Function
fq_codel_reset - patch exit - passes: 1024

>
> Let me think about how we could distinguish this case from other
> reset cases.

As above, the problem is most probably in excessive calls of
fq_codel_reset(). I'm not surprised it freezes whole server or network
adapter.

Thank you

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ