netdev - Re: [RFC PATCH net-next 0/3] sock: Fix sock queue mapping to include device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CA+FuTSdjL4bFYHXyH8dv2x-ZEQZSuA7R8ecttzdZMRwyPEF-=A@mail.gmail.com>
Date:   Thu, 22 Oct 2020 13:38:06 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Harshitha Ramamurthy <harshitha.ramamurthy@...el.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Tom Herbert <tom@...bertland.com>, carolyn.wyborny@...el.com,
        "Keller, Jacob E" <jacob.e.keller@...el.com>,
        amritha.nambiar@...el.com
Subject: Re: [RFC PATCH net-next 0/3] sock: Fix sock queue mapping to include device

On Wed, Oct 21, 2020 at 3:51 PM Harshitha Ramamurthy
<harshitha.ramamurthy@...el.com> wrote:
>
> In XPS, the transmit queue selected for a packet is saved in the associated
> sock for the packet and is then used to avoid recalculating the queue
> on subsequent sends. The problem is that the corresponding device is not
> also recorded so that when the queue mapping is referenced it may
> correspond to a different device than the sending one, resulting in an
> incorrect queue being used for transmit. Particularly with xps_rxqs, this
> can lead to non-deterministic behaviour as illustrated below.
>
> Consider a case where xps_rxqs is configured and there is a difference
> in number of Tx and Rx queues. Suppose we have 2 devices A and B. Device A
> has 0-7 queues and device B has 0-15 queues. Packets are transmitted from
> Device A but packets are received on B. For packets received on queue 0-7
> of Device B, xps_rxqs will be applied for reply packets to transmit on
> Device A's queues 0-7. However, when packets are received on queues
> 8-15 of Device B, normal XPS is used to reply packets when transmitting
> from Device A. This leads to non-deterministic behaviour. The case where
> there are fewer receive queues is even more insidious. Consider Device
> A, the trasmitting device has queues 0-15 and Device B, the receiver
> has queues 0-7. With xps_rxqs enabled, the packets will be received only
> on queues 0-7 of Device B, but sent only on 0-7 queues of Device A
> thereby causing a load imbalance.

So the issue is limited to xps_rxqs with multiple nics.

When do we need sk_tx_dev_and_queue_mapping (patch 3/3)? It is used in
netdev_pick_tx, but associations are reset on route change and
recomputed if queue_index would exceed the current device queue count.

> This patch set fixes the issue by recording both the device (via
> ifindex) and the queue in the sock mapping. The pair is set and
> retrieved atomically.

I guess this is the reason for the somewhat convoluted cast to u64
logic in patch 1/3. Is the assumption that 64-bit loads and stores are
atomic on all platforms? That is not correct.

Is atomicity even needed? For the purpose of load balancing it isn't.
Just adding a sk->rx_ifindex would be a lot simpler.

sk->sk_napi_id already uniquely identifies the device. Unfortunately,
dev_get_by_napi_id is not cheap (traverses a hashtable bucket). Though
purely for the purpose of load balancing this validation could be
sample based.

The rx ifindex is also already recorded for inet sockets in
rx_dst_ifindex, and the sk_rx_queue_get functions are limited to
those, so could conceivably use that. But it is derived from skb_iif,
which is overwritten with every reentry of __netif_receive_skb_core.