netdev - Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't have a socket

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1475162142.28155.188.camel@edumazet-glaptop3.roam.corp.google.com>
Date:   Thu, 29 Sep 2016 08:15:42 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Tom Herbert <tom@...bertland.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Kernel Team <kernel-team@...com>,
        Rick Jones <rick.jones2@....com>,
        Alexander Duyck <alexander.duyck@...il.com>
Subject: Re: [PATCH v2 net-next 4/5] xps_flows: XPS for packets that don't
 have a socket

On Thu, 2016-09-29 at 07:51 -0700, Eric Dumazet wrote:
> On Thu, 2016-09-29 at 10:08 -0400, Tom Herbert wrote:
> 
> > It addresses  the issue that Rick Jones pointed out was happening with
> > XPS. When packets are sent for a flow that has no socket and XPS is
> > enabled then each packet uses the XPS queue based on the running CPU.
> > Since the thread sending on a flow can be rescheduled on different
> > CPUs this is creating ooo packets. In this case the ooo is being
> > caused by interaction with XPS.
> > 
> 
> Nope, your patch does not address the problem properly.
> 
> I am not sure I want to spend more time explaining the issue.
> 
> Lets talk about this in Tokyo next week.
> 

Just as a reminder, sorry to bother you, stating some obvious facts for
both of us. We have public exchanges, so we also need to re-explain how
things work.

Queue selection on xmit happens before we hit the qdisc and its delays.

So when you access txq->dql.num_completed_ops and
txq->dql.num_enqueue_ops you can observe values that do not change for a
while.

Say a thread runs on a VM, and sends 2 packets P1, P2 on the same flow
(skb_get_hash() returns the same value for these 2 packets)

P1 is sent on behalf of CPU 1, we pickup queue txq1, and queue the
packet on its qdisc . Transmit does not happen because of some
constraints like rate limiting or scheduling constraints.

P2 is sent on behalf of CPU 2, we pickup queue txq2, notice that prior
packet chose txq1. We check txq1->dql and decide it is fine to use txq2,
since the dql params of txq1 were not changed yet.

( txq->dql.num_completed_ops == ent.queue_ptr )

Note that in RFS case, we have the guarantee that we observe 'live
queues' since they are the per cpu backlog.

So input_queue_head_incr() and input_queue_tail_incr_save() are
correctly doing the OOO prevention, because a queued packet immediately
changes the state.

So really your patch works if you have no qdisc, or a non congested
qdisc. (Think if P1 is dropped by a full pfifo or pfifo_fast : We really
want to avoid steering P2, P3, ..., PN on this full pfifo while maybe
other txq are idle). Strange attractors are back (check commit
9b462d02d6dd6 )

You could avoid (ab)using BQL with a different method, grabbing
skb->destructor for the packets that are socketless

The hash table would simply track the sum of skb->truesize to allow flow
migration. This would be self contained and not intrusive.