[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+FuTSdjL4bFYHXyH8dv2x-ZEQZSuA7R8ecttzdZMRwyPEF-=A@mail.gmail.com>
Date: Thu, 22 Oct 2020 13:38:06 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Harshitha Ramamurthy <harshitha.ramamurthy@...el.com>
Cc: Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Tom Herbert <tom@...bertland.com>, carolyn.wyborny@...el.com,
"Keller, Jacob E" <jacob.e.keller@...el.com>,
amritha.nambiar@...el.com
Subject: Re: [RFC PATCH net-next 0/3] sock: Fix sock queue mapping to include device
On Wed, Oct 21, 2020 at 3:51 PM Harshitha Ramamurthy
<harshitha.ramamurthy@...el.com> wrote:
>
> In XPS, the transmit queue selected for a packet is saved in the associated
> sock for the packet and is then used to avoid recalculating the queue
> on subsequent sends. The problem is that the corresponding device is not
> also recorded so that when the queue mapping is referenced it may
> correspond to a different device than the sending one, resulting in an
> incorrect queue being used for transmit. Particularly with xps_rxqs, this
> can lead to non-deterministic behaviour as illustrated below.
>
> Consider a case where xps_rxqs is configured and there is a difference
> in number of Tx and Rx queues. Suppose we have 2 devices A and B. Device A
> has 0-7 queues and device B has 0-15 queues. Packets are transmitted from
> Device A but packets are received on B. For packets received on queue 0-7
> of Device B, xps_rxqs will be applied for reply packets to transmit on
> Device A's queues 0-7. However, when packets are received on queues
> 8-15 of Device B, normal XPS is used to reply packets when transmitting
> from Device A. This leads to non-deterministic behaviour. The case where
> there are fewer receive queues is even more insidious. Consider Device
> A, the trasmitting device has queues 0-15 and Device B, the receiver
> has queues 0-7. With xps_rxqs enabled, the packets will be received only
> on queues 0-7 of Device B, but sent only on 0-7 queues of Device A
> thereby causing a load imbalance.
So the issue is limited to xps_rxqs with multiple nics.
When do we need sk_tx_dev_and_queue_mapping (patch 3/3)? It is used in
netdev_pick_tx, but associations are reset on route change and
recomputed if queue_index would exceed the current device queue count.
> This patch set fixes the issue by recording both the device (via
> ifindex) and the queue in the sock mapping. The pair is set and
> retrieved atomically.
I guess this is the reason for the somewhat convoluted cast to u64
logic in patch 1/3. Is the assumption that 64-bit loads and stores are
atomic on all platforms? That is not correct.
Is atomicity even needed? For the purpose of load balancing it isn't.
Just adding a sk->rx_ifindex would be a lot simpler.
sk->sk_napi_id already uniquely identifies the device. Unfortunately,
dev_get_by_napi_id is not cheap (traverses a hashtable bucket). Though
purely for the purpose of load balancing this validation could be
sample based.
The rx ifindex is also already recorded for inet sockets in
rx_dst_ifindex, and the sk_rx_queue_get functions are limited to
those, so could conceivably use that. But it is derived from skb_iif,
which is overwritten with every reentry of __netif_receive_skb_core.
Powered by blists - more mailing lists