netdev - Re: [net PATCH v2] net: sched, fix OOO packets with pfifo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpWNX-9p-bo+caUyJ8yfsNDS1a2pV9LNvHK4=y3ec4qRVw@mail.gmail.com>
Date:   Mon, 26 Mar 2018 10:30:56 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        David Miller <davem@...emloft.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [net PATCH v2] net: sched, fix OOO packets with pfifo_fast

On Sat, Mar 24, 2018 at 10:25 PM, John Fastabend
<john.fastabend@...il.com> wrote:
> After the qdisc lock was dropped in pfifo_fast we allow multiple
> enqueue threads and dequeue threads to run in parallel. On the
> enqueue side the skb bit ooo_okay is used to ensure all related
> skbs are enqueued in-order. On the dequeue side though there is
> no similar logic. What we observe is with fewer queues than CPUs
> it is possible to re-order packets when two instances of
> __qdisc_run() are running in parallel. Each thread will dequeue
> a skb and then whichever thread calls the ndo op first will
> be sent on the wire. This doesn't typically happen because
> qdisc_run() is usually triggered by the same core that did the
> enqueue. However, drivers will trigger __netif_schedule()
> when queues are transitioning from stopped to awake using the
> netif_tx_wake_* APIs. When this happens netif_schedule() calls
> qdisc_run() on the same CPU that did the netif_tx_wake_* which
> is usually done in the interrupt completion context. This CPU
> is selected with the irq affinity which is unrelated to the
> enqueue operations.

Interesting. Why this is unique to pfifo_fast? For me it could
happen to other qdisc's too, when we release the qdisc root
lock in sch_direct_xmit(), another CPU could dequeue from
the same qdisc and transmit the skb in parallel too?

...

> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 7e3fbe9..39c144b 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -373,24 +373,33 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>   */
>  static inline bool qdisc_restart(struct Qdisc *q, int *packets)
>  {
> +       bool more, validate, nolock = q->flags & TCQ_F_NOLOCK;
>         spinlock_t *root_lock = NULL;
>         struct netdev_queue *txq;
>         struct net_device *dev;
>         struct sk_buff *skb;
> -       bool validate;
>
>         /* Dequeue packet */
> +       if (nolock && test_and_set_bit(__QDISC_STATE_RUNNING, &q->state))
> +               return false;
> +

Nit: you probably want to move the comment below this if check,
or simply remove it since it is useless...