netdev - Re: xen-netback hotplug-status regression bug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <54659eec-e315-5dc5-1578-d91633a80077@xen.org>
Date:   Tue, 13 Apr 2021 11:55:36 +0100
From:   Paul Durrant <xadimgnik@...il.com>
To:     Michael Brown <mcb30@...e.org>, Wei Liu <wei.liu@...nel.org>,
        xen-devel@...ts.xenproject.org, netdev@...r.kernel.org,
        Paul Durrant <pdurrant@...zon.com>
Subject: Re: xen-netback hotplug-status regression bug

On 13/04/2021 11:48, Michael Brown wrote:
> On 13/04/2021 08:12, Paul Durrant wrote:
>>> If the frontend subsequently disconnects and reconnects (e.g. 
>>> transitions through Closed->Initialising->Connected) then:
>>>
>>> - Nothing recreates "hotplug-status"
>>>
>>> - When the frontend re-enters Connected state, connect() sets up a 
>>> watch on "hotplug-status" again
>>>
>>> - The callback hotplug_status_changed() is never triggered, and so 
>>> the backend device never transitions to Connected state.
>>
>> That's not how I read it. Given that "hotplug-status" is removed by 
>> the call to hotplug_status_changed() then the next call to connect() 
>> should fail to register the watch and 'have_hotplug_status_watch' 
>> should be 0. Thus backend_switch_state() should not defer the 
>> transition to XenbusStateConnected in any subsequent interaction with 
>> the frontend.
> 
> Thank you for the reply.  I've tested and confirmed my initial 
> hypothesis: the call to xenbus_watch_pathfmt() succeeds even if the node 
> does not exist.
> 
> I confirmed this with ftrace using:
> 
>   cd /sys/kernel/debug/tracing
>   echo function_graph > current_tracer
>   echo set_backend_state > set_ftrace_filter
>   echo xenbus_watch_pathfmt >> set_ftrace_filter
>   echo register_xenbus_watch >> set_ftrace_filter
>   echo xenbus_dev_fatal >> set_ftrace_filter
> 
> On the second time that the frontend transitions to Connected, this 
> produced the trace:
> 
>   set_backend_state [xen_netback]() {
>     register_xenbus_watch();
>     register_xenbus_watch();
>     xenbus_watch_pathfmt() {
>       register_xenbus_watch();
>     }
>   }
> 
> which seems to confirm that the error path in xenbus_watch_path() is 
> *not* taken, i.e. that the call to register_xenbus_watch() succeeded 
> even though the node did not exist.
> 
> 
> Other observations also seem to confirm this behaviour:
> 
> - Running "xenstore ls" in dom0 confirms that on the second frontend 
> transition to Connected, the frontend state is indeed Connected (4) but 
> the backend state remains in InitWait (2)
> 
> - Running "xenstore watch 
> /local/domain/0/backend/vif/<domU>/0/hotplug-status" *before* starting 
> the domU confirms that it is possible to create a watch on a node that 
> does not (yet) exist, and that the watch *is* notified when the node is 
> later created.
> 
>> Are you seeing the watch successfully re-registered even though the 
>> node does not exist? Perhaps there has been a change in xenstore 
>> behaviour?
> 
> So, the TL;DR is that yes, the watch does successfully register even 
> though the node does not exist.
> 
>  From a quick look through the xenstored source, it looks as though the 
> only check on the node name is the call to is_valid_nodename(), which 
> seems to perform a syntactic validity check only.  I can't immediately 
> find any commit that would have changed this behaviour.
> 

Ok, so it sound like this was probably my misunderstanding of xenstore 
semantics in the first place (although I'm sure I remember watch 
registration failing for non-existent nodes at some point in the past... 
that may have been with a non-upstream version of oxenstored though).

Anyway... a reasonable fix would therefore be to read the node first and 
only register the watch if it does exist.

   Paul

> Thanks,
> 
> Michael