[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd8d0c1f-f1ef-42e3-b6a9-24fb5c82f881@linux.alibaba.com>
Date: Thu, 30 Nov 2023 20:42:27 +0800
From: Heng Qi <hengqi@...ux.alibaba.com>
To: Paolo Abeni <pabeni@...hat.com>,
virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org
Cc: jasowang@...hat.com, mst@...hat.com, kuba@...nel.org,
edumazet@...gle.com, davem@...emloft.net, hawk@...nel.org,
john.fastabend@...il.com, ast@...nel.org, horms@...nel.org,
xuanzhuo@...ux.alibaba.com, yinjun.zhang@...igine.com
Subject: Re: [PATCH net-next v5 4/4] virtio-net: support rx netdim
在 2023/11/30 下午8:23, Paolo Abeni 写道:
> On Thu, 2023-11-30 at 20:09 +0800, Heng Qi wrote:
>> 在 2023/11/30 下午5:33, Paolo Abeni 写道:
>>> On Mon, 2023-11-27 at 10:55 +0800, Heng Qi wrote:
>>>> @@ -4738,11 +4881,14 @@ static void remove_vq_common(struct virtnet_info *vi)
>>>> static void virtnet_remove(struct virtio_device *vdev)
>>>> {
>>>> struct virtnet_info *vi = vdev->priv;
>>>> + int i;
>>>>
>>>> virtnet_cpu_notif_remove(vi);
>>>>
>>>> /* Make sure no work handler is accessing the device. */
>>>> flush_work(&vi->config_work);
>>>> + for (i = 0; i < vi->max_queue_pairs; i++)
>>>> + cancel_work(&vi->rq[i].dim.work);
>>> If the dim work is still running here, what prevents it from completing
>>> after the following unregister/free netdev?
>> Yes, no one here is trying to stop it,
> So it will cause UaF, right?
>
>> the situation is like
>> unregister/free netdev
>> when rss are being set, so I think this is ok.
>
> Could you please elaborate more the point?
If I'm not wrong, I think the following 2 scenarios are similar:
Scen2 1:
1. User uses ethtool to configure rss settings
2. ethtool core holds rtnl_lock
2. virtnet_remove() is called
3. virtnet_send_command() is called.
Scene 2:
1. virtnet_poll() queues a virtnet_rx_dim_work()
1. virtnet_rx_dim_work() is called and holds rtnl_lock
2. virtnet_remove() is called
3. virtnet_send_command() is called.
So I think it's ok to use cancel_work() here.
What do you think? :)
>
>>> It looks like you want need to call cancel_work_sync here?
>> In v4, Yinjun Zhang mentioned that _sync() can cause deadlock[1].
>> Therefore, cancel_work() is used here instead of cancel_work_sync() to
>> avoid possible deadlock.
>>
>> [1]
>> https://lore.kernel.org/all/20231122092939.1005591-1-yinjun.zhang@corigine.com/
> Here the call to cancel_work() happens while the caller does not held
> the rtnl lock, the deadlock reported above will not be triggered.
There's cancel_work_sync() in v4 and I did reproduce the deadlock.
rtnl_lock held -> .ndo_stop() -> cancel_work_sync() ->
virtnet_rx_dim_work(),
the work acquires the rtnl_lock again, then a deadlock occurs.
I tested the scenario of ctrl cmd/.remove/.ndo_stop()/dim_work when there is
a big concurrency, and cancel_work() works well.
Thanks!
>
>>> Additionally the later remove_vq_common() will needless call
>>> cancel_work() again;
>> Yes. remove_vq_common() now does not call cancel_work().
> I'm sorry, I missread the context in a previous chunk.
>
> The other point should still apply.
>
> Cheers,
>
> Paolo
Powered by blists - more mailing lists