linux-kernel - Re: [ovs-dev] [PATCH net] openvswitch: fix lockup on tx to unregistering netdev with carrier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f7to70fq5n4.fsf@redhat.com>
Date: Thu, 09 Jan 2025 12:05:51 -0500
From: Aaron Conole <aconole@...hat.com>
To: Ilya Maximets <i.maximets@....org>
Cc: netdev@...r.kernel.org,  dev@...nvswitch.org,  Friedrich Weber
 <f.weber@...xmox.com>,  Luca Czesla <luca.czesla@...l.schwarz>,
  linux-kernel@...r.kernel.org,  Felix Huettner
 <felix.huettner@...l.schwarz>,  Eric Dumazet <edumazet@...gle.com>,  Simon
 Horman <horms@...nel.org>,  Jakub Kicinski <kuba@...nel.org>,  Paolo Abeni
 <pabeni@...hat.com>,  "David S. Miller" <davem@...emloft.net>
Subject: Re: [ovs-dev] [PATCH net] openvswitch: fix lockup on tx to
 unregistering netdev with carrier

Ilya Maximets <i.maximets@....org> writes:

> Commit in a fixes tag attempted to fix the issue in the following
> sequence of calls:
>
>     do_output
>     -> ovs_vport_send
>        -> dev_queue_xmit
>           -> __dev_queue_xmit
>              -> netdev_core_pick_tx
>                 -> skb_tx_hash
>
> When device is unregistering, the 'dev->real_num_tx_queues' goes to
> zero and the 'while (unlikely(hash >= qcount))' loop inside the
> 'skb_tx_hash' becomes infinite, locking up the core forever.
>
> But unfortunately, checking just the carrier status is not enough to
> fix the issue, because some devices may still be in unregistering
> state while reporting carrier status OK.
>
> One example of such device is a net/dummy.  It sets carrier ON
> on start, but it doesn't implement .ndo_stop to set the carrier off.
> And it makes sense, because dummy doesn't really have a carrier.
> Therefore, while this device is unregistering, it's still easy to hit
> the infinite loop in the skb_tx_hash() from the OVS datapath.  There
> might be other drivers that do the same, but dummy by itself is
> important for the OVS ecosystem, because it is frequently used as a
> packet sink for tcpdump while debugging OVS deployments.  And when the
> issue is hit, the only way to recover is to reboot.
>
> Fix that by also checking if the device is running.  The running
> state is handled by the net core during unregistering, so it covers
> unregistering case better, and we don't really need to send packets
> to devices that are not running anyway.
>
> While only checking the running state might be enough, the carrier
> check is preserved.  The running and the carrier states seem disjoined
> throughout the code and different drivers.  And other core functions
> like __dev_direct_xmit() check both before attempting to transmit
> a packet.  So, it seems safer to check both flags in OVS as well.
>
> Fixes: 066b86787fa3 ("net: openvswitch: fix race on port output")
> Reported-by: Friedrich Weber <f.weber@...xmox.com>
> Closes: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-January/053423.html
> Signed-off-by: Ilya Maximets <i.maximets@....org>
> ---
>  net/openvswitch/actions.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 16e260014684..704c858cf209 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -934,7 +934,9 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
>  {
>  	struct vport *vport = ovs_vport_rcu(dp, out_port);
>  
> -	if (likely(vport && netif_carrier_ok(vport->dev))) {
> +	if (likely(vport &&
> +		   netif_running(vport->dev) &&
> +		   netif_carrier_ok(vport->dev))) {
>  		u16 mru = OVS_CB(skb)->mru;
>  		u32 cutlen = OVS_CB(skb)->cutlen;

Reviewed-by: Aaron Conole <aconole@...hat.com>

I tried on with my VMs to reproduce the issue as described in the email
report, but I probably didn't give enough resources (or gave too many
resources) to get the race condition to bubble up.  I was using kernel
6.13-rc5 (0bc21e701a6f) also.

In any case, I think the change makes sense.