netdev - Re: [PATCH net-next] xps: fix xps for stacked devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20150204.130308.1433096157491856856.davem@davemloft.net>
Date:	Wed, 04 Feb 2015 13:03:08 -0800 (PST)
From:	David Miller <davem@...emloft.net>
To:	eric.dumazet@...il.com
Cc:	netdev@...r.kernel.org, willemb@...gle.com, nanditad@...gle.com,
	ycheng@...gle.com
Subject: Re: [PATCH net-next] xps: fix xps for stacked devices

From: Eric Dumazet <eric.dumazet@...il.com>
Date: Tue, 03 Feb 2015 23:48:24 -0800

> From: Eric Dumazet <edumazet@...gle.com>
> 
> A typical qdisc setup is the following :
> 
> bond0 : bonding device, using HTB hierarchy
> eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc
> 
> XPS allows to spread packets on specific tx queues, based on the cpu
> doing the send.
> 
> Problem is that dequeues from bond0 qdisc can happen on random cpus,
> due to the fact that qdisc_run() can dequeue a batch of packets.
> 
> CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
> CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0
> 
> CPUB -> dequeue packet P1 from bond0
>         enqueue packet on eth1/eth2
> CPUC -> dequeue packet P2 from bond0
>         enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)
> 
> get_xps_queue() then might select wrong queue for P1, since current cpu
> might be different than CPUA.
> 
> P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
> if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)
> 
> Effect of this bug is TCP reorders, and more generally not optimal
> TX queue placement. (A victim bulk flow can be migrated to the wrong TX
> queue for a while)
> 
> To fix this, we have to record sender cpu number the first time
> dev_queue_xmit() is called for one tx skb.
> 
> We can union napi_id (used on receive path) and sender_cpu,
> granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
> this union idea)
> 
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html