linux-kernel - Re: [PATCH] virtio-net: disable delayed refill when setting up xdp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <653b71a4-99a5-49d4-b927-fe2bf5890896@gmail.com>
Date: Thu, 3 Apr 2025 15:10:57 +0700
From: Bui Quang Minh <minhquangbui99@...il.com>
To: Paolo Abeni <pabeni@...hat.com>, virtualization@...ts.linux.dev
Cc: "Michael S . Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
 Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 John Fastabend <john.fastabend@...il.com>,
 Eugenio Pérez <eperezma@...hat.com>,
 "David S . Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH] virtio-net: disable delayed refill when setting up xdp

On 4/3/25 14:24, Paolo Abeni wrote:
> On 4/2/25 7:42 AM, Bui Quang Minh wrote:
>> When setting up XDP for a running interface, we call napi_disable() on
>> the receive queue's napi. In delayed refill_work, it also calls
>> napi_disable() on the receive queue's napi. This can leads to deadlock
>> when napi_disable() is called on an already disabled napi. This commit
>> fixes this by disabling future and cancelling all inflight delayed
>> refill works before calling napi_disabled() in virtnet_xdp_set.
>>
>> Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
>> Signed-off-by: Bui Quang Minh <minhquangbui99@...il.com>
>> ---
>>   drivers/net/virtio_net.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 7e4617216a4b..33406d59efe2 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -5956,6 +5956,15 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>   	if (!prog && !old_prog)
>>   		return 0;
>>   
>> +	/*
>> +	 * Make sure refill_work does not run concurrently to
>> +	 * avoid napi_disable race which leads to deadlock.
>> +	 */
>> +	if (netif_running(dev)) {
>> +		disable_delayed_refill(vi);
>> +		cancel_delayed_work_sync(&vi->refill);
> AFAICS at this point refill_work() could still be running, why don't you
> need to call flush_delayed_work()?

AFAIK, the cancel_delayed_work_sync (this is a synchronous version) 
provides somewhat stronger guarantee than the flush_delayed_work. 
Internally, the cancel_delayed_work_sync will also call to __flush_work. 
The cancel_delayed_work_sync temporarily disables the work before 
calling __flush_work, so that even if refill_work tries to re-queue 
itself, that re-queue will fail. As the refill_work can actually 
re-queue itself, I think we must use cancel_delayed_work_sync here.

Thanks,
Quang Minh.