netdev - Re: [net PATCH v2] net: sched, fix OOO packets with pfifo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 18 Apr 2018 09:28:32 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     John Fastabend <john.fastabend@...il.com>,
        Cong Wang <xiyou.wangcong@...il.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        David Miller <davem@...emloft.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [net PATCH v2] net: sched, fix OOO packets with pfifo_fast

Hi,

let me revive this old thread...

On Mon, 2018-03-26 at 11:16 -0700, John Fastabend wrote:
> On 03/26/2018 10:30 AM, Cong Wang wrote:
> > On Sat, Mar 24, 2018 at 10:25 PM, John Fastabend
> > <john.fastabend@...il.com> wrote:
> > > After the qdisc lock was dropped in pfifo_fast we allow multiple
> > > enqueue threads and dequeue threads to run in parallel. On the
> > > enqueue side the skb bit ooo_okay is used to ensure all related
> > > skbs are enqueued in-order. On the dequeue side though there is
> > > no similar logic. What we observe is with fewer queues than CPUs
> > > it is possible to re-order packets when two instances of
> > > __qdisc_run() are running in parallel. Each thread will dequeue
> > > a skb and then whichever thread calls the ndo op first will
> > > be sent on the wire. This doesn't typically happen because
> > > qdisc_run() is usually triggered by the same core that did the
> > > enqueue. However, drivers will trigger __netif_schedule()
> > > when queues are transitioning from stopped to awake using the
> > > netif_tx_wake_* APIs. When this happens netif_schedule() calls
> > > qdisc_run() on the same CPU that did the netif_tx_wake_* which
> > > is usually done in the interrupt completion context. This CPU
> > > is selected with the irq affinity which is unrelated to the
> > > enqueue operations.
> > 
> > Interesting. Why this is unique to pfifo_fast? For me it could
> > happen to other qdisc's too, when we release the qdisc root
> > lock in sch_direct_xmit(), another CPU could dequeue from
> > the same qdisc and transmit the skb in parallel too?
> > 
> 
> Agreed, my guess is it never happens because the timing is
> tighter in the lock case. Or if it is happening its infrequent
> enough that no one noticed the OOO packets.

I think the above could not happend due to the qdisc seqlock - which is
not acquired by NOLOCK qdiscs.

> For net-next we probably could clean this up. I was just
> going for something simple in net that didn't penalize all
> qdiscs as Eric noted. This patch doesn't make it any worse
> at least. And we have been living with the above race for
> years.

I've benchmarked this patch is some different scenario, and in my
testing it introduces a measurable regression in uncontended/lightly
contended scenarios. The measured peak negative delta is with a pktgen
thread using "xmit_mode queue_xmit":

before: 27674032 pps
after: 23809052 pps

I spend some time searching a way to improve this, without success.

John, did you had any chance to look at this again?

Thanks,

Paolo