[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <404130e4-210d-2214-47a8-833c0463d997@fensystems.co.uk>
Date: Mon, 10 May 2021 19:47:01 +0100
From: Michael Brown <mbrown@...systems.co.uk>
To: Marek Marczykowski-Górecki
<marmarek@...isiblethingslab.com>
Cc: paul@....org, xen-devel@...ts.xenproject.org,
netdev@...r.kernel.org, wei.liu@...nel.org, pdurrant@...zon.com
Subject: Re: [PATCH] xen-netback: Check for hotplug-status existence before
watching
On 10/05/2021 19:32, Marek Marczykowski-Górecki wrote:
> On Tue, Apr 13, 2021 at 04:25:12PM +0100, Michael Brown wrote:
>> The logic in connect() is currently written with the assumption that
>> xenbus_watch_pathfmt() will return an error for a node that does not
>> exist. This assumption is incorrect: xenstore does allow a watch to
>> be registered for a nonexistent node (and will send notifications
>> should the node be subsequently created).
>>
>> As of commit 1f2565780 ("xen-netback: remove 'hotplug-status' once it
>> has served its purpose"), this leads to a failure when a domU
>> transitions into XenbusStateConnected more than once. On the first
>> domU transition into Connected state, the "hotplug-status" node will
>> be deleted by the hotplug_status_changed() callback in dom0. On the
>> second or subsequent domU transition into Connected state, the
>> hotplug_status_changed() callback will therefore never be invoked, and
>> so the backend will remain stuck in InitWait.
>>
>> This failure prevents scenarios such as reloading the xen-netfront
>> module within a domU, or booting a domU via iPXE. There is
>> unfortunately no way for the domU to work around this dom0 bug.
>>
>> Fix by explicitly checking for existence of the "hotplug-status" node,
>> thereby creating the behaviour that was previously assumed to exist.
>
> This change is wrong. The 'hotplug-status' node is created _only_ by a
> hotplug script and done so when it's executed. When kernel waits for
> hotplug script to be executed it waits for the node to _appear_, not
> _change_. So, this change basically made the kernel not waiting for the
> hotplug script at all.
That doesn't sound plausible to me. In the setup as you describe, how
is the kernel expected to differentiate between "hotplug script has not
yet created the node" and "hotplug script does not exist and will
therefore never create any node"?
Michael
Powered by blists - more mailing lists