lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8b7a9cd5-3696-65c2-5656-a1c8eb174344@xen.org>
Date:   Tue, 18 May 2021 07:57:16 +0100
From:   Paul Durrant <xadimgnik@...il.com>
To:     Marek Marczykowski-Górecki 
        <marmarek@...isiblethingslab.com>,
        "Durrant, Paul" <pdurrant@...zon.co.uk>
Cc:     Michael Brown <mbrown@...systems.co.uk>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "wei.liu@...nel.org" <wei.liu@...nel.org>,
        Ian Jackson <iwj@...project.org>, Wei Liu <wl@....org>,
        Anthony PERARD <anthony.perard@...rix.com>
Subject: Re: [PATCH] xen-netback: Check for hotplug-status existence before
 watching

On 17/05/2021 22:43, Marek Marczykowski-Górecki wrote:
> On Tue, May 11, 2021 at 12:46:38PM +0000, Durrant, Paul wrote:
>>> -----Original Message-----
>>> From: Marek Marczykowski-Górecki <marmarek@...isiblethingslab.com>
>>> Sent: 11 May 2021 11:45
>>> To: Durrant, Paul <pdurrant@...zon.co.uk>
>>> Cc: Michael Brown <mbrown@...systems.co.uk>; paul@....org; xen-devel@...ts.xenproject.org;
>>> netdev@...r.kernel.org; wei.liu@...nel.org
>>> Subject: RE: [EXTERNAL] [PATCH] xen-netback: Check for hotplug-status existence before watching
>>>
>>> On Tue, May 11, 2021 at 12:40:54PM +0200, Marek Marczykowski-Górecki wrote:
>>>> On Tue, May 11, 2021 at 07:06:55AM +0000, Durrant, Paul wrote:
>>>>>> -----Original Message-----
>>>>>> From: Marek Marczykowski-Górecki <marmarek@...isiblethingslab.com>
>>>>>> Sent: 10 May 2021 20:43
>>>>>> To: Michael Brown <mbrown@...systems.co.uk>; paul@....org
>>>>>> Cc: paul@....org; xen-devel@...ts.xenproject.org; netdev@...r.kernel.org; wei.liu@...nel.org;
>>> Durrant,
>>>>>> Paul <pdurrant@...zon.co.uk>
>>>>>> Subject: RE: [EXTERNAL] [PATCH] xen-netback: Check for hotplug-status existence before watching
>>>>>>
>>>>>> On Mon, May 10, 2021 at 08:06:55PM +0100, Michael Brown wrote:
>>>>>>> If you have a suggested patch, I'm happy to test that it doesn't reintroduce
>>>>>>> the regression bug that was fixed by this commit.
>>>>>>
>>>>>> Actually, I've just tested with a simple reloading xen-netfront module. It
>>>>>> seems in this case, the hotplug script is not re-executed. In fact, I
>>>>>> think it should not be re-executed at all, since the vif interface
>>>>>> remains in place (it just gets NO-CARRIER flag).
>>>>>>
>>>>>> This brings a question, why removing hotplug-status in the first place?
>>>>>> The interface remains correctly configured by the hotplug script after
>>>>>> all. From the commit message:
>>>>>>
>>>>>>      xen-netback: remove 'hotplug-status' once it has served its purpose
>>>>>>
>>>>>>      Removing the 'hotplug-status' node in netback_remove() is wrong; the script
>>>>>>      may not have completed. Only remove the node once the watch has fired and
>>>>>>      has been unregistered.
>>>>>>
>>>>>> I think the intention was to remove 'hotplug-status' node _later_ in
>>>>>> case of quickly adding and removing the interface. Is that right, Paul?
>>>>>
>>>>> The removal was done to allow unbind/bind to function correctly. IIRC before the original patch
>>> doing a bind would stall forever waiting for the hotplug status to change, which would never happen.
>>>>
>>>> Hmm, in that case maybe don't remove it at all in the backend, and let
>>>> it be cleaned up by the toolstack, when it removes other backend-related
>>>> nodes?
>>>
>>> No, unbind/bind _does_ require hotplug script to be called again.
>>>
>>
>> Yes, sorry I was misremembering. My memory is hazy but there was definitely a problem at the time with leaving the node in place.
>>
>>> When exactly vif interface appears in the system (starts to be available
>>> for the hotplug script)? Maybe remove 'hotplug-status' just before that
>>> point?
>>>
>>
>> I really can't remember any detail. Perhaps try reverting both patches then and check that the unbind/rmmod/modprobe/bind sequence still works (and the backend actually makes it into connected state).
> 
> Ok, I've tried this. I've reverted both commits, then used your test
> script from the 9476654bd5e8ad42abe8ee9f9e90069ff8e60c17:
>      
>      This has been tested by running iperf as a server in the test VM and
>      then running a client against it in a continuous loop, whilst also
>      running:
>      
>      while true;
>        do echo vif-$DOMID-$VIF >unbind;
>        echo down;
>        rmmod xen-netback;
>        echo unloaded;
>        modprobe xen-netback;
>        cd $(pwd);
>        brctl addif xenbr0 vif$DOMID.$VIF;
>        ip link set vif$DOMID.$VIF up;
>        echo up;
>        sleep 5;
>        done
>      
>      in dom0 from /sys/bus/xen-backend/drivers/vif to continuously unbind,
>      unload, re-load, re-bind and re-plumb the backend.
>      
> In fact, the need to call `brctl` and `ip link` manually is exactly
> because the hotplug script isn't executed. When I execute it manually,
> the backend properly gets back to working. So, removing 'hotplug-status'
> was in the correct place (netback_remove). The missing part is the toolstack
> calling the hotplug script on xen-netback re-bind.
> 

Why is that missing? We're going behind the back of the toolstack to do 
the unbind and bind so why should we expect it to re-execute a hotplug 
script?

> In this case, I'm not sure what is the proper way. If I restart
> xendriverdomain service (I do run the backend in domU), it properly
> executes hotplug script and the backend interface gets properly
> configured. But it doesn't do it on its own. It seems to be related to
> device "state" in xenstore. The specific part of the libxl is
> backend_watch_callback():
> https://github.com/xen-project/xen/blob/master/tools/libs/light/libxl_device.c#L1664
> 
>      ddev = search_for_device(dguest, dev);
>      if (ddev == NULL && state == XenbusStateClosed) {
>          /*
>           * Spurious state change, device has already been disconnected
>           * or never attached.
>           */
>          goto skip;
>      } else if (ddev == NULL) {
>          rc = add_device(egc, nested_ao, dguest, dev);
>          if (rc > 0)
>              free_ao = true;
>      } else if (state == XenbusStateClosed && online == 0) {
>          rc = remove_device(egc, nested_ao, dguest, ddev);
>          if (rc > 0)
>              free_ao = true;
>          check_and_maybe_remove_guest(gc, ddomain, dguest);
>      }
> 
> In short: if device gets XenbusStateInitWait for the first time (ddev ==
> NULL case), it goes to add_device() which executes the hotplug script
> and stores the device.
> Then, if device goes to XenbusStateClosed + online==0 state, then it
> executes hotplug script again (with "offline" parameter) and forgets the
> device. If you unbind the driver, the device stays in
> XenbusStateConnected state (in xenstore), and after you bind it again,
> it goes to XenbusStateInitWait. It don't think it goes through
> XenbusStateClosed, and online stays at 1 too, so libxl doesn't execute
> the hotplug script again.

This is pretty key. The frontend should not notice an unbind/bind i.e. 
there should be no evidence of it happening by examining states in 
xenstore (from the guest side).

   Paul

> 
> Some solution could be to add an extra case at the end, like "ddev !=
> NULL && state == XenbusStateInitWait && hotplug-status != connected".
> And make sure xl devd won't call the same hotplug script multiple times
> for the same device _at the same time_ (I'm not sure about the async
> machinery here).
> 
> But even if xl devd (aka xendriverdomain service) gets "fixes" to
> execute hotplug script in that case, I don't think it would work in
> backend in dom0 case - there, I think nothing watches already configured
> vif interfaces (there is no xl devd daemon in dom0, and xl background
> process watches only domain death and cdrom eject events).
> 
> I'm adding toolstack maintainers, maybe they'll have some idea...
> 
> In any case, the issue is not calling the hotplug script, responsible
> for configuring newly created vif interface. Not kernel waiting for it.
> So, I think both commits should still be reverted.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ