lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 16 Mar 2021 09:15:11 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Yunsheng Lin <linyunsheng@...wei.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        David Miller <davem@...emloft.net>,
        Vladimir Oltean <olteanv@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>, Wei Wang <weiwan@...gle.com>,
        Cong Wang <cong.wang@...edance.com>,
        Taehee Yoo <ap420073@...il.com>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, linuxarm@...neuler.org,
        Marc Kleine-Budde <mkl@...gutronix.de>,
        linux-can@...r.kernel.org
Subject: Re: [RFC v2] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc

On Tue, Mar 16, 2021 at 1:35 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> On 2021/3/16 2:53, Jakub Kicinski wrote:
> > On Mon, 15 Mar 2021 11:10:18 +0800 Yunsheng Lin wrote:
> >> @@ -606,6 +623,11 @@ static const u8 prio2band[TC_PRIO_MAX + 1] = {
> >>   */
> >>  struct pfifo_fast_priv {
> >>      struct skb_array q[PFIFO_FAST_BANDS];
> >> +
> >> +    /* protect against data race between enqueue/dequeue and
> >> +     * qdisc->empty setting
> >> +     */
> >> +    spinlock_t lock;
> >>  };
> >>
> >>  static inline struct skb_array *band2list(struct pfifo_fast_priv *priv,
> >> @@ -623,7 +645,10 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
> >>      unsigned int pkt_len = qdisc_pkt_len(skb);
> >>      int err;
> >>
> >> -    err = skb_array_produce(q, skb);
> >> +    spin_lock(&priv->lock);
> >> +    err = __ptr_ring_produce(&q->ring, skb);
> >> +    WRITE_ONCE(qdisc->empty, false);
> >> +    spin_unlock(&priv->lock);
> >>
> >>      if (unlikely(err)) {
> >>              if (qdisc_is_percpu_stats(qdisc))
> >> @@ -642,6 +667,7 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
> >>      struct sk_buff *skb = NULL;
> >>      int band;
> >>
> >> +    spin_lock(&priv->lock);
> >>      for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
> >>              struct skb_array *q = band2list(priv, band);
> >>
> >> @@ -655,6 +681,7 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
> >>      } else {
> >>              WRITE_ONCE(qdisc->empty, true);
> >>      }
> >> +    spin_unlock(&priv->lock);
> >>
> >>      return skb;
> >>  }
> >
> > I thought pfifo was supposed to be "lockless" and this change
> > re-introduces a lock between producer and consumer, no?
>
> Yes, the lock breaks the "lockless" of the lockless qdisc for now
> I do not how to solve the below data race locklessly:
>
>         CPU1:                                   CPU2:
>       dequeue skb                                .
>           .                                      .
>           .                                 enqueue skb
>           .                                      .
>           .                      WRITE_ONCE(qdisc->empty, false);
>           .                                      .
>           .                                      .
> WRITE_ONCE(qdisc->empty, true);


Maybe it is time to fully document/explain how this can possibly work.

lockless qdisc used concurrently by multiple cpus, using
WRITE_ONCE() and READ_ONCE() ?

Just say no to this.

>
> If the above happens, the qdisc->empty is true even if the qdisc has some
> skb, which may cuase out of order or packet stuck problem.
>
> It seems we may need to update ptr_ring' status(empty or not) while
> enqueuing/dequeuing atomically in the ptr_ring implementation.
>
> Any better idea?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ