[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20250812090817.3463403-1-junnan01.wu@samsung.com>
Date: Tue, 12 Aug 2025 17:08:17 +0800
From: Junnan Wu <junnan01.wu@...sung.com>
To: mst@...hat.com, jasowang@...hat.com, andrew+netdev@...n.ch,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com
Cc: xuanzhuo@...ux.alibaba.com, eperezma@...hat.com,
virtualization@...ts.linux.dev, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, ying123.xu@...sung.com,
lei19.wang@...sung.com, q1.huang@...sung.com, Junnan Wu
<junnan01.wu@...sung.com>
Subject: [PATCH net] virtio_net: adjust the execution order of function
`virtnet_close` during freeze
"Use after free" issue appears in suspend once race occurs when
napi poll scheduls after `netif_device_detach` and before napi disables.
For details, during suspend flow of virtio-net,
the tx queue state is set to "__QUEUE_STATE_DRV_XOFF" by CPU-A.
And at some coincidental times, if a TCP connection is still working,
CPU-B does `virtnet_poll` before napi disable.
In this flow, the state "__QUEUE_STATE_DRV_XOFF"
of tx queue will be cleared. This is not the normal process it expects.
After that, CPU-A continues to close driver then virtqueue is removed.
Sequence likes below:
--------------------------------------------------------------------------
CPU-A CPU-B
----- -----
suspend is called A TCP based on
virtio-net still work
virtnet_freeze
|- virtnet_freeze_down
| |- netif_device_detach
| | |- netif_tx_stop_all_queues
| | |- netif_tx_stop_queue
| | |- set_bit
| | (__QUEUE_STATE_DRV_XOFF,...)
| | softirq rasied
| | |- net_rx_action
| | |- napi_poll
| | |- virtnet_poll
| | |- virtnet_poll_cleantx
| | |- netif_tx_wake_queue
| | |- test_and_clear_bit
| | (__QUEUE_STATE_DRV_XOFF,...)
| |- virtnet_close
| |- virtnet_disable_queue_pair
| |- virtnet_napi_tx_disable
|- remove_vq_common
--------------------------------------------------------------------------
When TCP delayack timer is up, a cpu gets softirq and irq handler
`tcp_delack_timer_handler` will be called, which will finally call
`start_xmit` in virtio net driver.
Then the access to tx virtq will cause panic.
The root cause of this issue is that napi tx
is not disable before `netif_tx_stop_queue`,
once `virnet_poll` schedules in such coincidental time,
the tx queue state will be cleared.
To solve this issue, adjusts the order of
function `virtnet_close` in `virtnet_freeze_down`.
Co-developed-by: Ying Xu <ying123.xu@...sung.com>
Signed-off-by: Ying Xu <ying123.xu@...sung.com>
Signed-off-by: Junnan Wu <junnan01.wu@...sung.com>
---
drivers/net/virtio_net.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d14e6d602273..975bdc5dab84 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -5758,14 +5758,15 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
disable_rx_mode_work(vi);
flush_work(&vi->rx_mode_work);
- netif_tx_lock_bh(vi->dev);
- netif_device_detach(vi->dev);
- netif_tx_unlock_bh(vi->dev);
if (netif_running(vi->dev)) {
rtnl_lock();
virtnet_close(vi->dev);
rtnl_unlock();
}
+
+ netif_tx_lock_bh(vi->dev);
+ netif_device_detach(vi->dev);
+ netif_tx_unlock_bh(vi->dev);
}
static int init_vqs(struct virtnet_info *vi);
--
2.34.1
Powered by blists - more mailing lists