lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Tue, 25 Aug 2020 10:18:05 +0800
From:   Fengkehuan Feng <kehuan.feng@...il.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     Jike Song <albcamus@...il.com>, Josh Hunt <johunt@...mai.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Jonas Bonn <jonas.bonn@...rounds.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Michael Zhivich <mzhivich@...mai.com>,
        David Miller <davem@...emloft.net>,
        John Fastabend <john.fastabend@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>
Subject: Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

Hillf,

With the latest version (attached what I have changed on my tree), the
system failed to start up with cpu stalled.


Hillf Danton <hdanton@...a.com> 于2020年8月22日周六 上午11:30写道:
>
>
> On Thu, 20 Aug 2020 20:43:17 +0800 Hillf Danton wrote:
> > Hi Jike,
> >
> > On Thu, 20 Aug 2020 15:43:17 +0800 Jike Song wrote:
> > > Hi Josh,
> > >
> > > On Fri, Jul 3, 2020 at 2:14 AM Josh Hunt <johunt@...mai.com> wrote:
> > > {snip}
> > > > Initial results with Cong's patch look promising, so far no stalls. We
> > > > will let it run over the long weekend and report back on Tuesday.
> > > >
> > > > Paolo - I have concerns about possible performance regression with the
> > > > change as well. If you can gather some data that would be great. If
> > > > things look good with our low throughput test over the weekend we can
> > > > also try assessing performance next week.
> > > >
> > >
> > > We met possibly the same problem when testing nvidia/mellanox's
> >
> > Below is what was sent in reply to this thread early last month with
> > minor tuning, based on the seqlock. Feel free to drop an echo if it
> > makes ant-antenna-size sense in your tests.
> >
> > > GPUDirect RDMA product, we found that changing NET_SCH_DEFAULT to
> > > DEFAULT_FQ_CODEL mitigated the problem, having no idea why. Maybe you
> > > can also have a try?
> > >
> > > Besides, our testing is pretty complex, do you have a quick test to
> > > reproduce it?
> > >
> > > --
> > > Thanks,
> > > Jike
> >
> >
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -79,6 +79,7 @@ struct Qdisc {
> >  #define TCQ_F_INVISIBLE              0x80 /* invisible by default in dump */
> >  #define TCQ_F_NOLOCK         0x100 /* qdisc does not require locking */
> >  #define TCQ_F_OFFLOADED              0x200 /* qdisc is offloaded to HW */
> > +     int                     pkt_seq;
> >       u32                     limit;
> >       const struct Qdisc_ops  *ops;
> >       struct qdisc_size_table __rcu *stab;
> > @@ -156,6 +157,7 @@ static inline bool qdisc_is_empty(const
> >  static inline bool qdisc_run_begin(struct Qdisc *qdisc)
> >  {
> >       if (qdisc->flags & TCQ_F_NOLOCK) {
> > +             qdisc->pkt_seq++;
> >               if (!spin_trylock(&qdisc->seqlock))
> >                       return false;
> >               WRITE_ONCE(qdisc->empty, false);
> > --- a/include/net/pkt_sched.h
> > +++ b/include/net/pkt_sched.h
> > @@ -117,7 +117,9 @@ void __qdisc_run(struct Qdisc *q);
> >
> >  static inline void qdisc_run(struct Qdisc *q)
> >  {
> > -     if (qdisc_run_begin(q)) {
> > +     while (qdisc_run_begin(q)) {
> > +             int seq = q->pkt_seq;
> > +
> >               /* NOLOCK qdisc must check 'state' under the qdisc seqlock
> >                * to avoid racing with dev_qdisc_reset()
> >                */
> > @@ -125,6 +127,9 @@ static inline void qdisc_run(struct Qdis
> >                   likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
> >                       __qdisc_run(q);
> >               qdisc_run_end(q);
> > +
> > +             if (!(q->flags & TCQ_F_NOLOCK) || seq == q->pkt_seq)
> > +                     return;
> >       }
> >  }
>
> The echo from Feng indicates that it's hard to conclude that TCQ_F_NOLOCK
> is the culprit, lets try again with it ignored for now.
>
> Every pkt enqueued on pfifo_fast is tracked in the below diff, and those
> pkts enqueued while we're running qdisc are detected and handled to cut
> the chance for the stuck pkts reported.
>
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -79,6 +79,7 @@ struct Qdisc {
>  #define TCQ_F_INVISIBLE                0x80 /* invisible by default in dump */
>  #define TCQ_F_NOLOCK           0x100 /* qdisc does not require locking */
>  #define TCQ_F_OFFLOADED                0x200 /* qdisc is offloaded to HW */
> +       int                     pkt_seq;
>         u32                     limit;
>         const struct Qdisc_ops  *ops;
>         struct qdisc_size_table __rcu *stab;
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -631,6 +631,7 @@ static int pfifo_fast_enqueue(struct sk_
>                         return qdisc_drop(skb, qdisc, to_free);
>         }
>
> +       qdisc->pkt_seq++;
>         qdisc_update_stats_at_enqueue(qdisc, pkt_len);
>         return NET_XMIT_SUCCESS;
>  }
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -117,7 +117,8 @@ void __qdisc_run(struct Qdisc *q);
>
>  static inline void qdisc_run(struct Qdisc *q)
>  {
> -       if (qdisc_run_begin(q)) {
> +       while (qdisc_run_begin(q)) {
> +               int seq = q->pkt_seq;
>                 /* NOLOCK qdisc must check 'state' under the qdisc seqlock
>                  * to avoid racing with dev_qdisc_reset()
>                  */
> @@ -125,6 +126,12 @@ static inline void qdisc_run(struct Qdis
>                     likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
>                         __qdisc_run(q);
>                 qdisc_run_end(q);
> +
> +               /* go another round if there are pkts enqueued after
> +                * taking seq_lock
> +                */
> +               if (seq != q->pkt_seq)
> +                       continue;
>         }
>  }
>
>

Download attachment "fix_nolock_from_hillf.patch" of type "application/octet-stream" (1260 bytes)

Powered by blists - more mailing lists