[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c46e0a96-b027-903e-bc08-0daa9a54e1af@gmail.com>
Date: Thu, 6 Jan 2022 01:00:07 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>, davem@...emloft.net,
kuba@...nel.org
Cc: netdev@...r.kernel.org, laurent.bernaille@...adoghq.com,
maciej.fijalkowski@...el.com, toshiaki.makita1@...il.com,
pabeni@...hat.com, john.fastabend@...il.com, willemb@...gle.com
Subject: Re: [PATCH net-next] veth: Do not record rx queue hint in veth_xmit
On 1/5/22 16:46, Daniel Borkmann wrote:
> Laurent reported that they have seen a significant amount of TCP retransmissions
> at high throughput from applications residing in network namespaces talking to
> the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
> as per systemd default) of the phys device such as ena or virtio_net due to all
> traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
> setup was _not_ using XDP on veths as the issue is generic.)
>
> More specifically, after edbea9220251 ("veth: Store queue_mapping independently
> of XDP prog presence") which made it all the way back to v4.19.184+,
> skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX
> queue by default for veths) instead of leaving at 0.
>
> This is eventually retained and callbacks like ena_select_queue() will also pick
> single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
> is forwarded to that device via upper stack or other means. Similarly, for others
> not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
> call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.
>
> In general, it is a _bad_ idea for virtual devices like veth to mess around with
> queue selection [by default]. Given dev->real_num_tx_queues is by default 1,
> the skb->queue_mapping was left untouched, and so prior to edbea9220251 the
> netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.
Nice changelog and fix, thanks Daniel !
Reviewed-by: Eric Dumazet <edumazet@...gle.com>
> Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
> from veth_xmit() altogether.
>
> If the veth peer has an XDP program attached, then it would return the first RX
> queue index in xdp_md->rx_queue_index (unless configured in non-default manner).
> However, this is still better than breaking the generic case.
>
> Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence")
> Fixes: 638264dc9022 ("veth: Support per queue XDP ring")
> Reported-by: Laurent Bernaille <laurent.bernaille@...adoghq.com>
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> Cc: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
> Cc: Toshiaki Makita <toshiaki.makita1@...il.com>
> Cc: Eric Dumazet <eric.dumazet@...il.com>
> Cc: Paolo Abeni <pabeni@...hat.com>
> Cc: John Fastabend <john.fastabend@...il.com>
> Cc: Willem de Bruijn <willemb@...gle.com>
> ---
> drivers/net/veth.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index d21dd25f429e..354a963075c5 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -335,7 +335,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
> */
> use_napi = rcu_access_pointer(rq->napi) &&
> veth_skb_is_eligible_for_gro(dev, rcv, skb);
> - skb_record_rx_queue(skb, rxq);
> }
>
> skb_tx_timestamp(skb);
Powered by blists - more mailing lists