lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Oct 2020 12:47:40 -0700
From:   Harshitha Ramamurthy <harshitha.ramamurthy@...el.com>
To:     netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org
Cc:     tom@...bertland.com, carolyn.wyborny@...el.com,
        jacob.e.keller@...el.com, amritha.nambiar@...el.com,
        Harshitha Ramamurthy <harshitha.ramamurthy@...el.com>
Subject: [RFC PATCH net-next 0/3] sock: Fix sock queue mapping to include device

In XPS, the transmit queue selected for a packet is saved in the associated
sock for the packet and is then used to avoid recalculating the queue
on subsequent sends. The problem is that the corresponding device is not
also recorded so that when the queue mapping is referenced it may
correspond to a different device than the sending one, resulting in an
incorrect queue being used for transmit. Particularly with xps_rxqs, this
can lead to non-deterministic behaviour as illustrated below.

Consider a case where xps_rxqs is configured and there is a difference
in number of Tx and Rx queues. Suppose we have 2 devices A and B. Device A
has 0-7 queues and device B has 0-15 queues. Packets are transmitted from
Device A but packets are received on B. For packets received on queue 0-7
of Device B, xps_rxqs will be applied for reply packets to transmit on
Device A's queues 0-7. However, when packets are received on queues
8-15 of Device B, normal XPS is used to reply packets when transmitting
from Device A. This leads to non-deterministic behaviour. The case where
there are fewer receive queues is even more insidious. Consider Device
A, the trasmitting device has queues 0-15 and Device B, the receiver
has queues 0-7. With xps_rxqs enabled, the packets will be received only
on queues 0-7 of Device B, but sent only on 0-7 queues of Device A
thereby causing a load imbalance.

This patch set fixes the issue by recording both the device (via
ifindex) and the queue in the sock mapping. The pair is set and
retrieved atomically. While retrieving the queue using the get
functions, we check if the ifindex held is the same as the ifindex
stored before returning the queue held. For instance during transmit,
we return a valid queue number only after checking if the ifindex stored
matches the device currently held.

This patch set contains:
	- Definition of dev_and_queue structure to hold the ifindex
	  and queue number
	- Generic functions to get, set, and clear dev_and_queue
	  structure
	- Change sk_tx_queue_{get,set,clear} to
	  sk_tx_dev_and_queue_{get,set,clear}
	- Modify callers of above to use new interface
	- Change sk_rx_queue_{get,set,clear} to 
          sk_rx_dev_and_queue_{get,set,clear}
        - Modify callers of above to use new interface

This patch set was tested as follows:
	- XPS with both xps_cpus and xps_rxqs works as expected
	- the Q index is calculated only once when picking a tx queue
	  per connection. For ex: in netdev_pick_tx

Tom Herbert (3):
  sock: Definition and general functions for dev_and_queue structure
  sock: Use dev_and_queue structure for RX queue mapping in sock
  sock: Use dev_and_queue structure for TX queue mapping in sock

 .../mellanox/mlx5/core/en_accel/ktls_rx.c     |   6 +-
 drivers/net/hyperv/netvsc_drv.c               |   4 +-
 include/net/busy_poll.h                       |   2 +-
 include/net/request_sock.h                    |   2 +-
 include/net/sock.h                            | 107 ++++++++++++------
 net/core/dev.c                                |   6 +-
 net/core/filter.c                             |   7 +-
 net/core/sock.c                               |  10 +-
 net/ipv4/tcp_input.c                          |   2 +-
 9 files changed, 93 insertions(+), 53 deletions(-)

-- 
2.26.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ