linux-kernel - Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CACGkMEtJ-jErjFQgBcEPVeUo4rHejTZ0cuCCVzHSjzk8S80W5A@mail.gmail.com>
Date: Fri, 5 Dec 2025 09:31:55 +0800
From: Jason Wang <jasowang@...hat.com>
To: Bui Quang Minh <minhquangbui99@...il.com>
Cc: "Michael S. Tsirkin" <mst@...hat.com>, linux-kernel@...r.kernel.org, 
	Paolo Abeni <pabeni@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, 
	Eugenio Pérez <eperezma@...hat.com>, 
	Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Jesper Dangaard Brouer <hawk@...nel.org>, John Fastabend <john.fastabend@...il.com>, 
	Stanislav Fomichev <sdf@...ichev.me>, virtualization@...ts.linux.dev, netdev@...r.kernel.org, 
	bpf@...r.kernel.org
Subject: Re: [PATCH RFC] virtio_net: gate delayed refill scheduling

On Thu, Dec 4, 2025 at 11:08 PM Bui Quang Minh <minhquangbui99@...il.com> wrote:
>
> On 12/3/25 13:37, Jason Wang wrote:
> > On Tue, Dec 2, 2025 at 11:29 PM Bui Quang Minh <minhquangbui99@...il.com> wrote:
> >> On 12/2/25 13:03, Jason Wang wrote:
> >>> On Mon, Dec 1, 2025 at 11:04 PM Bui Quang Minh <minhquangbui99@...il.com> wrote:
> >>>> On 11/28/25 09:20, Jason Wang wrote:
> >>>>> On Fri, Nov 28, 2025 at 1:47 AM Bui Quang Minh <minhquangbui99@...il.com> wrote:
> >>>>>> I think the the requeue in refill_work is not the problem here. In
> >>>>>> virtnet_rx_pause[_all](), we use cancel_work_sync() which is safe to
> >>>>>> use "even if the work re-queues itself". AFAICS, cancel_work_sync()
> >>>>>> will disable work -> flush work -> enable again. So if the work requeue
> >>>>>> itself in flush work, the requeue will fail because the work is already
> >>>>>> disabled.
> >>>>> Right.
> >>>>>
> >>>>>> I think what triggers the deadlock here is a bug in
> >>>>>> virtnet_rx_resume_all(). virtnet_rx_resume_all() calls to
> >>>>>> __virtnet_rx_resume() which calls napi_enable() and may schedule
> >>>>>> refill. It schedules the refill work right after napi_enable the first
> >>>>>> receive queue. The correct way must be napi_enable all receive queues
> >>>>>> before scheduling refill work.
> >>>>> So what you meant is that the napi_disable() is called for a queue
> >>>>> whose NAPI has been disabled?
> >>>>>
> >>>>> cpu0] enable_delayed_refill()
> >>>>> cpu0] napi_enable(queue0)
> >>>>> cpu0] schedule_delayed_work(&vi->refill)
> >>>>> cpu1] napi_disable(queue0)
> >>>>> cpu1] napi_enable(queue0)
> >>>>> cpu1] napi_disable(queue1)
> >>>>>
> >>>>> In this case cpu1 waits forever while holding the netdev lock. This
> >>>>> looks like a bug since the netdev_lock 413f0271f3966 ("net: protect
> >>>>> NAPI enablement with netdev_lock()")?
> >>>> Yes, I've tried to fix it in 4bc12818b363 ("virtio-net: disable delayed
> >>>> refill when pausing rx"), but it has flaws.
> >>> I wonder if a simplified version is just restoring the behaviour
> >>> before 413f0271f3966 by using napi_enable_locked() but maybe I miss
> >>> something.
> >> As far as I understand, before 413f0271f3966 ("net: protect NAPI
> >> enablement with netdev_lock()"), the napi is protected by the
> > I guess you meant napi enable/disable actually.
> >
> >> rtnl_lock(). But in the refill_work, we don't acquire the rtnl_lock(),
> > Any reason we need to hold rtnl_lock() there?
>
> Correct me if I'm wrong here. Before 413f0271f3966 ("net: protect NAPI
> enablement with netdev_lock()"), napi_disable and napi_enable are not
> safe to be called concurrently.
>
> The example race is
>
> napi_disable -> napi_save_config -> write to n->config->defer_hard_irqs
> napi_enable -> napi_restore_config -> read n->config->defer_hard_irqs
>
> In refill_work, we don't hold any locks so the race scenario can happen.

Ok, I get you, so it occurs after we introduced the NAPI config to virtio-net.

>
> Maybe I misunderstand what you mean by restoring the behavior before
> 413f0271f3966. Do you mean that we use this pattern
>
>      In virtnet_xdp_se;
>
>      netdev_lock(dev);
>      virtnet_rx_pause_all()
>          -> napi_disable_locked
>
>      virtnet_rx_resume_all()
>          -> napi_disable_locked
>      netdev_unlock(dev);
>
> And in other places where we pause the rx too. It will hold the
> netdev_lock during the time napi is disabled so that even when
> refill_work happens concurrently, napi_disable cannot acquire the
> netdev_lock and gets stuck inside.

It might work but I think it would be easy to either

1) use locked version everywhere

or

2) use the unlocked version everywhere

Mix using locked and unlocked may cause the code hard to be maintained

Thanks

>
>
> >
> >> so it seems like we will have race condition before 413f0271f3966 ("net:
> >> protect NAPI enablement with netdev_lock()").
> >>
> >> Thanks,
> >> Quang Minh.
> >>
> > Thanks
> >
>
> Thanks,
> Quang Minh.
>