netdev - Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <75a47ba2-4524-6309-a758-490471d15c5f@gmail.com>
Date:   Thu, 14 Mar 2019 20:20:42 +0200
From:   Oleksandr Andrushchenko <andr2000@...il.com>
To:     Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        netdev@...r.kernel.org, xen-devel@...ts.xenproject.org,
        linux-kernel@...r.kernel.org, jgross@...e.com,
        sstabellini@...nel.org, davem@...emloft.net
Cc:     Oleksandr Andrushchenko <oleksandr_andrushchenko@...m.com>,
        Volodymyr Babchuk <Volodymyr_Babchuk@...m.com>
Subject: Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback

On 3/14/19 20:16, Boris Ostrovsky wrote:
> On 3/14/19 12:33 PM, Oleksandr Andrushchenko wrote:
>> On 3/14/19 17:40, Boris Ostrovsky wrote:
>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@...m.com>
>>>>>>>>
>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>> This leads to the number of consequences:
>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>> cannot
>>>>>>>>       be cleanly done as the backend still holds those (it was not
>>>>>>>> told to
>>>>>>>>       free the resources)
>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>> communication
>>>>>>>>       means with the backned were destroyed by the frontend, thus
>>>>>>>>       making the frontend appear to the guest OS as functional, but
>>>>>>>>       not really.
>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>> connectivity?
>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>> what it does it destroys the rings etc. and never notifies the
>>>>>> backend
>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>> there is
>>>>>> no way its state machine can help recovering.
>>>>> My tree is about a month old so perhaps there is some sort of
>>>>> regression
>>>>> but this certainly works for me. After resume netfront gets
>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>> Ah, the difference can be of the way we get the guest enter
>>>> the suspend state. I am making my guest to suspend with:
>>>> echo mem > /sys/power/state
>>>> And then I use an interrupt to the guest (this is a test code)
>>>> to wake it up.
>>>> Could you please share your exact use-case when the guest enters
>>>> suspend
>>>> and what you do to resume it?
>>> xl save / xl restore
>>>
>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>> use-case
>>>> as it simply doesn't know we want him to.
>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>> Hm, so it does work for your use-case, but doesn't for mine.
>>
>> What would be the best way forward?
>>
>> 1. Implement .resume properly as, for example, block front does [1]
>>
>> 2. Remove .resume completely: this does work as long as backend
>> doesn't change anything
> For save/restore (migration) there is no guarantee that the new backend
> has the same set of features.
>
>> I am still a bit unsure if we really need to re-initialize rings,
>> re-read front's config from
>>
>> Xenstore etc - what changes on backend side are expected when we
>> resume the front driver?
>
> Number of queues, for example. Or things in xennet_fix_features().
Ok, so it seems I have no choice, but implement proper .resume then )
>
> -boris
Thank you!
>>>
>>> -boris
>> Thank you,
>>
>> Oleksandr
>>
>>
>> [1]
>> https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072
>>