[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250701035110-mutt-send-email-mst@kernel.org>
Date: Tue, 1 Jul 2025 03:52:40 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Zigit Zo <zuozhijie@...edance.com>
Cc: jasowang@...hat.com, xuanzhuo@...ux.alibaba.com, eperezma@...hat.com,
virtualization@...ts.linux.dev, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH net] virtio-net: fix a rtnl_lock()
deadlock during probing
On Tue, Jul 01, 2025 at 03:48:41PM +0800, Zigit Zo wrote:
> On 6/30/25 10:54 PM, Michael S. Tsirkin wrote:
> > On Mon, Jun 30, 2025 at 10:50:55AM -0400, Michael S. Tsirkin wrote:
> >> On Mon, Jun 30, 2025 at 05:51:09PM +0800, Zigit Zo wrote:
> >>> This bug happens if the VMM sends a VIRTIO_NET_S_ANNOUNCE request while
> >>> the virtio-net driver is still probing with rtnl_lock() hold, this will
> >>> cause a recursive mutex in netdev_notify_peers().
> >>>
> >>> Fix it by skip acking the annouce in virtnet_config_changed_work() when
> >>> probing. The annouce will still get done when ndo_open() enables the
> >>> virtio_config_driver_enable().
> >>
> >> I am not so sure it will be - while driver is not loaded, device does
> >> not have to send interrupts, and there's no rule I'm aware of that says
> >> we'll get one after DRIVER_OK.
>
> Yep, at first we're thinking that when the VIRTIO_NET_S_ANNOUNCE flag set,
> we can always assure an interrupt has fired by VMM, to notify the driver
> to do the announcement.
>
> But later we realized that the S_ANNOUNCE flag can be sent before the
> driver's probing, and for QEMU seems to set the status flag regardless of
> whether driver is ready, so the problem you're talking still may happens.
> >> How about, we instead just schedule the work to do it later?I'm not sure if scheduling the work later will break df28de7b0050, the work
> was being scheduled before that commit, and we have no much idea of why that
> commit removes the schedule_work, we just keep it for safe...
Well managing async things is always tricky. Direct call is safer.
If you reintroduce it, you need to audit all call paths for safely.
> Then, for plan A, we change the check_announce to schedule_announce, and if
> that's true, we do another schedule_work to call virtnet_config_changed_work
> again to finish the announcement, like
>
> if (v & VIRTIO_NET_S_ANNOUNCE) {
> if (unlikely(schedule_announce))
> schedule_work(&vi->config_work);
> else {
> netdev_notify_peers(vi->dev);
> virtnet_ack_link_announce(vi);
> }
> }
>
> >>
> >> Also, there is another bug here.
> >> If ndo_open did not run, we actually should not send any announcements.
> >>
> >> Do we care if carrier on is set on probe or on open?
> >> If not, let's just defer this to ndo_open?
> >
> > Hmm yes I think we do, device is visible to userspace is it not?
> >
> > Hmm. We can keep the announce bit set in vi->status and on open, check
> > it and then schedule a work to do the announcement.
>
> Okay, so there's a plan B, we save the bit and re-check it in ndo_open, like
>
> /* __virtnet_config_changed_work() */
> if (v & VIRTIO_NET_S_ANNOUNCE) {
> vi->status |= VIRTIO_NET_S_ANNOUNCE;
> if (unlikely(!check_announce))
> goto check_link;
>
> netdev_notify_peers(vi->dev);
> virtnet_ack_link_announce(vi);
> vi->status &= ~VIRTIO_NET_S_ANNOUNCE;
> }
>
> /* virtnet_open() */
> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> if (vi->status & VIRTIO_NET_S_LINK_UP)
> netif_carrier_on(vi->dev);
> if
> if (vi->status & VIRTIO_NET_S_ANNOUNCE)
> schedule_work(&vi->config_work);
> virtio_config_driver_enable(vi->vdev);
> }
>
> This is a dirty demo, any ideas are welcomed :)
>
> (I think in virtnet_open() we can make the S_LINK_UP being scheduled as well?)
Powered by blists - more mailing lists