[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20150204.130308.1433096157491856856.davem@davemloft.net>
Date: Wed, 04 Feb 2015 13:03:08 -0800 (PST)
From: David Miller <davem@...emloft.net>
To: eric.dumazet@...il.com
Cc: netdev@...r.kernel.org, willemb@...gle.com, nanditad@...gle.com,
ycheng@...gle.com
Subject: Re: [PATCH net-next] xps: fix xps for stacked devices
From: Eric Dumazet <eric.dumazet@...il.com>
Date: Tue, 03 Feb 2015 23:48:24 -0800
> From: Eric Dumazet <edumazet@...gle.com>
>
> A typical qdisc setup is the following :
>
> bond0 : bonding device, using HTB hierarchy
> eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc
>
> XPS allows to spread packets on specific tx queues, based on the cpu
> doing the send.
>
> Problem is that dequeues from bond0 qdisc can happen on random cpus,
> due to the fact that qdisc_run() can dequeue a batch of packets.
>
> CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
> CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0
>
> CPUB -> dequeue packet P1 from bond0
> enqueue packet on eth1/eth2
> CPUC -> dequeue packet P2 from bond0
> enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)
>
> get_xps_queue() then might select wrong queue for P1, since current cpu
> might be different than CPUA.
>
> P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
> if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)
>
> Effect of this bug is TCP reorders, and more generally not optimal
> TX queue placement. (A victim bulk flow can be migrated to the wrong TX
> queue for a while)
>
> To fix this, we have to record sender cpu number the first time
> dev_queue_xmit() is called for one tx skb.
>
> We can union napi_id (used on receive path) and sender_cpu,
> granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
> this union idea)
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists