lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 18 Sep 2020 10:06:56 +0800
From:   Kehuan Feng <kehuan.feng@...il.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Yunsheng Lin <linyunsheng@...wei.com>,
        Hillf Danton <hdanton@...a.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Jike Song <albcamus@...il.com>, Josh Hunt <johunt@...mai.com>,
        Jonas Bonn <jonas.bonn@...rounds.com>,
        Michael Zhivich <mzhivich@...mai.com>,
        David Miller <davem@...emloft.net>,
        John Fastabend <john.fastabend@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        "linuxarm@...wei.com" <linuxarm@...wei.com>
Subject: Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

Sorry, guys, the experiment environment is no longer existing now. We
finally use fq_codel for online product.

Cong Wang <xiyou.wangcong@...il.com> 于2020年9月18日周五 上午3:52写道:
>
> On Sun, Sep 13, 2020 at 7:10 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
> >
> > On 2020/9/11 4:19, Cong Wang wrote:
> > > On Thu, Sep 3, 2020 at 8:21 PM Kehuan Feng <kehuan.feng@...il.com> wrote:
> > >> I also tried Cong's patch (shown below on my tree) and it could avoid
> > >> the issue (stressing for 30 minutus for three times and not jitter
> > >> observed).
> > >
> > > Thanks for verifying it!
> > >
> > >>
> > >> --- ./include/net/sch_generic.h.orig 2020-08-21 15:13:51.787952710 +0800
> > >> +++ ./include/net/sch_generic.h 2020-09-03 21:36:11.468383738 +0800
> > >> @@ -127,8 +127,7 @@
> > >>  static inline bool qdisc_run_begin(struct Qdisc *qdisc)
> > >>  {
> > >>   if (qdisc->flags & TCQ_F_NOLOCK) {
> > >> - if (!spin_trylock(&qdisc->seqlock))
> > >> - return false;
> > >> + spin_lock(&qdisc->seqlock);
> > >>   } else if (qdisc_is_running(qdisc)) {
> > >>   return false;
> > >>   }
> > >>
> > >> I am not actually know what you are discussing above. It seems to me
> > >> that Cong's patch is similar as disabling lockless feature.
> > >
> > >>From performance's perspective, yeah. Did you see any performance
> > > downgrade with my patch applied? It would be great if you can compare
> > > it with removing NOLOCK. And if the performance is as bad as no
> > > NOLOCK, then we can remove the NOLOCK bit for pfifo_fast, at least
> > > for now.
> >
> > It seems the lockless qdisc may have below concurrent problem:
> >   cpu0:                                                           cpu1:
> > q->enqueue                                                          .
> > qdisc_run_begin(q)                                                  .
> > __qdisc_run(q) ->qdisc_restart() -> dequeue_skb()                   .
> >                                  -> sch_direct_xmit()               .
> >                                                                     .
> >                                                                 q->enqueue
> >                                                              qdisc_run_begin(q)
> > qdisc_run_end(q)
> >
> >
> > cpu1 enqueue a skb without calling __qdisc_run(), and cpu0 did not see the
> > enqueued skb when calling __qdisc_run(q) because cpu1 may enqueue the skb
> > after cpu0 called __qdisc_run(q) and before cpu0 called qdisc_run_end(q).
>
> This is the same problem that my patch fixes, I do not know
> why you are suggesting another patch despite quoting mine.
> Please read the whole thread if you want to participate.
>
> Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ