[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f469cdee-f97e-da3f-bcab-0be9ed8cd836@xen.org>
Date: Tue, 13 Apr 2021 08:12:44 +0100
From: Paul Durrant <xadimgnik@...il.com>
To: Michael Brown <mcb30@...e.org>, Wei Liu <wei.liu@...nel.org>,
xen-devel@...ts.xenproject.org, netdev@...r.kernel.org,
Paul Durrant <pdurrant@...zon.com>
Subject: Re: xen-netback hotplug-status regression bug
On 10/04/2021 19:25, Michael Brown wrote:
> Commit https://github.com/torvalds/linux/commit/1f25657 ("xen-netback:
> remove 'hotplug-status' once it has served its purpose") seems to have
> introduced a regression that prevents a vif frontend from transitioning
> more than once into Connected state.
>
> As far as I can tell:
>
> - The defined vif script (e.g. /etc/xen/scripts/vif-bridge) executes
> only once, at domU startup, and sets
> backend/vif/<domU>/0/hotplug-status="connected"
>
> - When the frontend first enters Connected state,
> drivers/net/xen-netback/xenbus.c's connect() sets up a watch on
> "hotplug-status" with the callback function hotplug_status_changed()
>
> - When hotplug_status_changed() is triggered by the watch, it
> transitions the backend to Connected state and calls xenbus_rm() to
> delete the "hotplug-status" attribute.
>
> If the frontend subsequently disconnects and reconnects (e.g.
> transitions through Closed->Initialising->Connected) then:
>
> - Nothing recreates "hotplug-status"
>
> - When the frontend re-enters Connected state, connect() sets up a watch
> on "hotplug-status" again
>
> - The callback hotplug_status_changed() is never triggered, and so the
> backend device never transitions to Connected state.
>
That's not how I read it. Given that "hotplug-status" is removed by the
call to hotplug_status_changed() then the next call to connect() should
fail to register the watch and 'have_hotplug_status_watch' should be 0.
Thus backend_switch_state() should not defer the transition to
XenbusStateConnected in any subsequent interaction with the frontend.
>
> Reverting the commit would fix this bug, but would obviously also
> reintroduce the race condition that the commit was designed to avoid.
>
> I'm happy to put together a patch, if one of the maintainers could
> suggest a sensible design approach.
>
Are you seeing the watch successfully re-registered even though the node
does not exist? Perhaps there has been a change in xenstore behaviour?
Paul
> I'm not a list member, so please CC me directly on replies.
>
> Thanks,
>
> Michael
Powered by blists - more mailing lists