lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 1 Dec 2023 02:11:08 +0000
From: Yinjun Zhang <yinjun.zhang@...igine.com>
To: Heng Qi <hengqi@...ux.alibaba.com>, Paolo Abeni <pabeni@...hat.com>,
	"virtualization@...ts.linux-foundation.org"
	<virtualization@...ts.linux-foundation.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>
CC: "jasowang@...hat.com" <jasowang@...hat.com>, "mst@...hat.com"
	<mst@...hat.com>, "kuba@...nel.org" <kuba@...nel.org>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "davem@...emloft.net" <davem@...emloft.net>,
	"hawk@...nel.org" <hawk@...nel.org>, "john.fastabend@...il.com"
	<john.fastabend@...il.com>, "ast@...nel.org" <ast@...nel.org>,
	"horms@...nel.org" <horms@...nel.org>, "xuanzhuo@...ux.alibaba.com"
	<xuanzhuo@...ux.alibaba.com>
Subject: RE: [PATCH net-next v5 4/4] virtio-net: support rx netdim

On Thursday, November 30, 2023 8:42 PM, Heng Qi wrote:
<...>
> >>>>    static void virtnet_remove(struct virtio_device *vdev)
> >>>>    {
> >>>>            struct virtnet_info *vi = vdev->priv;
> >>>> +  int i;
> >>>>
> >>>>            virtnet_cpu_notif_remove(vi);
> >>>>
> >>>>            /* Make sure no work handler is accessing the device. */
> >>>>            flush_work(&vi->config_work);
> >>>> +  for (i = 0; i < vi->max_queue_pairs; i++)
> >>>> +          cancel_work(&vi->rq[i].dim.work);
<...> 
> There's cancel_work_sync() in v4 and I did reproduce the deadlock.
> 
> rtnl_lock held -> .ndo_stop() -> cancel_work_sync() ->
> virtnet_rx_dim_work(),
> the work acquires the rtnl_lock again, then a deadlock occurs.
> 
> I tested the scenario of ctrl cmd/.remove/.ndo_stop()/dim_work when there
> is
> a big concurrency, and cancel_work() works well.

I think the question here is why do you need call `cancel_work()` in `remove()`?
You already call it in `close()`, and the callstack is:
remove() ->  unregister_netdev() -> rtnl_lock() -> ndo_stop() -> close()

And similarly, you don't need it in the unwind path in `probe()` either.

> 
<...>

Powered by blists - more mailing lists