lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 14 Jan 2022 09:16:28 +0100
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     linux-kernel@...r.kernel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org,
        Laurent Bernaille <laurent.bernaille@...adoghq.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Toshiaki Makita <toshiaki.makita1@...il.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Paolo Abeni <pabeni@...hat.com>,
        John Fastabend <john.fastabend@...il.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>,
        "David S. Miller" <davem@...emloft.net>
Subject: [PATCH 5.15 28/41] veth: Do not record rx queue hint in veth_xmit

From: Daniel Borkmann <daniel@...earbox.net>

commit 710ad98c363a66a0cd8526465426c5c5f8377ee0 upstream.

Laurent reported that they have seen a significant amount of TCP retransmissions
at high throughput from applications residing in network namespaces talking to
the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
as per systemd default) of the phys device such as ena or virtio_net due to all
traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
setup was _not_ using XDP on veths as the issue is generic.)

More specifically, after edbea9220251 ("veth: Store queue_mapping independently
of XDP prog presence") which made it all the way back to v4.19.184+,
skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX
queue by default for veths) instead of leaving at 0.

This is eventually retained and callbacks like ena_select_queue() will also pick
single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
is forwarded to that device via upper stack or other means. Similarly, for others
not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.

In general, it is a _bad_ idea for virtual devices like veth to mess around with
queue selection [by default]. Given dev->real_num_tx_queues is by default 1,
the skb->queue_mapping was left untouched, and so prior to edbea9220251 the
netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.

Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
from veth_xmit() altogether.

If the veth peer has an XDP program attached, then it would return the first RX
queue index in xdp_md->rx_queue_index (unless configured in non-default manner).
However, this is still better than breaking the generic case.

Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence")
Fixes: 638264dc9022 ("veth: Support per queue XDP ring")
Reported-by: Laurent Bernaille <laurent.bernaille@...adoghq.com>
Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
Cc: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: Toshiaki Makita <toshiaki.makita1@...il.com>
Cc: Eric Dumazet <eric.dumazet@...il.com>
Cc: Paolo Abeni <pabeni@...hat.com>
Cc: John Fastabend <john.fastabend@...il.com>
Cc: Willem de Bruijn <willemb@...gle.com>
Acked-by: John Fastabend <john.fastabend@...il.com>
Reviewed-by: Eric Dumazet <edumazet@...gle.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@...il.com>
Signed-off-by: David S. Miller <davem@...emloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
---
 drivers/net/veth.c |    1 -
 1 file changed, 1 deletion(-)

--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -342,7 +342,6 @@ static netdev_tx_t veth_xmit(struct sk_b
 		 */
 		use_napi = rcu_access_pointer(rq->napi) &&
 			   veth_skb_is_eligible_for_gro(dev, rcv, skb);
-		skb_record_rx_queue(skb, rxq);
 	}
 
 	skb_tx_timestamp(skb);


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ