[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <df9e9a32b0294aee814eeb58d2d71edd@EX13D32EUC003.ant.amazon.com>
Date: Tue, 11 May 2021 07:06:55 +0000
From: "Durrant, Paul" <pdurrant@...zon.co.uk>
To: Marek Marczykowski-Górecki
<marmarek@...isiblethingslab.com>,
Michael Brown <mbrown@...systems.co.uk>,
"paul@....org" <paul@....org>
CC: "paul@....org" <paul@....org>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"wei.liu@...nel.org" <wei.liu@...nel.org>
Subject: RE: [PATCH] xen-netback: Check for hotplug-status existence before watching
> -----Original Message-----
> From: Marek Marczykowski-Górecki <marmarek@...isiblethingslab.com>
> Sent: 10 May 2021 20:43
> To: Michael Brown <mbrown@...systems.co.uk>; paul@....org
> Cc: paul@....org; xen-devel@...ts.xenproject.org; netdev@...r.kernel.org; wei.liu@...nel.org; Durrant,
> Paul <pdurrant@...zon.co.uk>
> Subject: RE: [EXTERNAL] [PATCH] xen-netback: Check for hotplug-status existence before watching
>
> On Mon, May 10, 2021 at 08:06:55PM +0100, Michael Brown wrote:
> > If you have a suggested patch, I'm happy to test that it doesn't reintroduce
> > the regression bug that was fixed by this commit.
>
> Actually, I've just tested with a simple reloading xen-netfront module. It
> seems in this case, the hotplug script is not re-executed. In fact, I
> think it should not be re-executed at all, since the vif interface
> remains in place (it just gets NO-CARRIER flag).
>
> This brings a question, why removing hotplug-status in the first place?
> The interface remains correctly configured by the hotplug script after
> all. From the commit message:
>
> xen-netback: remove 'hotplug-status' once it has served its purpose
>
> Removing the 'hotplug-status' node in netback_remove() is wrong; the script
> may not have completed. Only remove the node once the watch has fired and
> has been unregistered.
>
> I think the intention was to remove 'hotplug-status' node _later_ in
> case of quickly adding and removing the interface. Is that right, Paul?
The removal was done to allow unbind/bind to function correctly. IIRC before the original patch doing a bind would stall forever waiting for the hotplug status to change, which would never happen.
> In that case, letting hotplug_status_changed() remove the entry wont
> work, because the watch was unregistered few lines earlier in
> netback_remove(). And keeping the watch is not an option, because the
> whole backend_info struct is going to be free-ed already.
>
> If my guess about the original reason for the change is right, I think
> it should be fixed at the hotplug script level - it should check if the
> device is still there before writing 'hotplug-status' node.
> I'm not sure if doing it race-free is possible from a shell script (I think it
> requires doing xenstore read _and_ write in a single transaction). But
> in the worst case, the aftermath of loosing the race is leaving stray
> 'hotplug-status' xenstore node - not ideal, but also less harmful than
> failing to bring up an interface. At this point, the toolstack could cleanup
> it later, perhaps while setting up that interface again (if it gets
> re-connected)?
>
> Anyway, perhaps the best thing to do now, is to revert both commits, and
> think of an alternative solution for the original issue? That of course
> assumes I guessed correctly why it was done in the first place...
>
Simply reverting everything would likely break the ability to do unbind and bind (which is useful e.g to allow update the netback module whilst guests are still running) so I don't think that's an option.
Paul
Powered by blists - more mailing lists