[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d709d185-5345-c463-3fd1-e711f954e58a@gmail.com>
Date: Mon, 18 Mar 2019 12:02:49 +0200
From: Oleksandr Andrushchenko <andr2000@...il.com>
To: Julien Grall <julien.grall@....com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
netdev@...r.kernel.org, xen-devel@...ts.xenproject.org,
linux-kernel@...r.kernel.org, jgross@...e.com,
sstabellini@...nel.org, davem@...emloft.net
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@...m.com>,
Oleksandr Andrushchenko <oleksandr_andrushchenko@...m.com>,
kamatam@...zon.com, anchalag@...zon.com, eduval@...zon.com
Subject: Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume
callback
+Amazon
pls see inline
On 3/14/19 9:00 PM, Julien Grall wrote:
> Hi,
>
> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@...m.com>
>>>>>>>
>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>> This leads to the number of consequences:
>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>> cannot
>>>>>>> be cleanly done as the backend still holds those (it was not
>>>>>>> told to
>>>>>>> free the resources)
>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>> communication
>>>>>>> means with the backned were destroyed by the frontend, thus
>>>>>>> making the frontend appear to the guest OS as functional, but
>>>>>>> not really.
>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>> connectivity?
>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>> what it does it destroys the rings etc. and never notifies the
>>>>> backend
>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>> there is
>>>>> no way its state machine can help recovering.
>>>>
>>>> My tree is about a month old so perhaps there is some sort of
>>>> regression
>>>> but this certainly works for me. After resume netfront gets
>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>> Ah, the difference can be of the way we get the guest enter
>>> the suspend state. I am making my guest to suspend with:
>>> echo mem > /sys/power/state
>>> And then I use an interrupt to the guest (this is a test code)
>>> to wake it up.
>>> Could you please share your exact use-case when the guest enters
>>> suspend
>>> and what you do to resume it?
>>
>>
>> xl save / xl restore
>>
>>> I can see no way backend may want enter XenbusStateInitWait in my
>>> use-case
>>> as it simply doesn't know we want him to.
>>
>>
>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>
> I remember a series from amazon [1] that plays around suspend and
> hibernation. The patch [2] leads me to think that guest triggered
> suspend/resume does not work properly. It looks like the series has
> never been fully reviewed. Not sure why...
Julien, thanks a lot for bringing these patches to our attention which
we obviously missed.
>
> Anyway, from my understanding this series may solve Oleksandr issue.
> However, this would only address the common code side. AFAIK Oleksandr
> is targeting Arm platform. If so, I think this would require more work
> than this series. Arm code still miss few bits properly suspend/resume
> arch specific code (see [2]).
>
> I have a branch on my git to track the series. However, they never
> have been resent after Ian Campbell left Citrix. I would be happy to
> review them if someone wants to pick them up and repost them.
>
First of all, let me make it clear that we are interested in hibernation
long term, so it would be
desirable to re-use as much work form resume/suspend as we can. But, we
see it as a step by
step work, e.g. first S2RAM and later on hibernation.
Let me clarify the immediate use-case that we have, so it is easier to
understand what we want
and what we don't at the moment. We are about to continue work started
by Mirela/Xilinx on
Suspend-to-RAM for ARM [3] and we made number of assumptions:
1. We are talking about *system* suspend, e.g. the goal is to suspend
all the components
of the system and Xen itself at once. Think about this as fast-boot
and/or energy saving
feature if you will.
2. With suspend/resume there is no intention to migrate VMs to any other
host.
3. Most probably configuration of the back/front won't change between
suspend/resume.
But long term we are also thinking for supporting suspend/resume in its
broader meaning,
e.g. what is probably what you mean by suspend/resume.
Given that, we think that we don't need Xen support to save grants, page
tables and other
VM's context on suspend at least at the first stage as we are
implementing not a fully
blown suspend/resume, but only S2RAM part of it which is much more
simpler than a generic
suspend implementation. We only need changes to Linux kernel frontend
drivers from [1] - the
piece that we miss is suspend/resume implementation in the netfront
driver. What is more, as
we are not changing back/front configuration, we can even live with
empty .resume/.suspend
frontend's callbacks because event channels, rings etc. are "statically"
allocated in our
use-case at the first system start (cold boot). And indeed, tests show
that waking domains
in the right order do allow that.
So, frankly, from [3] we are immediately interested in implementing
.resume/.suspend, not
even freeze/thaw/restore callbacks: if Amazon has will and capacity to
continue working on [3]
then once that gets into the upstream it also solves our S2RAM use-case,
but if not then we
can probably re-work netfront patch and only provide .resume/.suspend
callbacks which we need
for now (remember our very specific use-case which can survive suspend
without callbacks
implemented).
IMO, patches at [2] seem to be useful while implementing generic
suspend/resume and can
be postponed for S2RAM.
Julien/Juergen/Boris/Amazon - could you please express your view on the
above?
Is it acceptable that for now we only take re-worked netfront patch from
[3] with full
implementation in mind for later (we reuse code for .resume/.suspend)?
> Cheers,
>
> [1]
> https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>
> [2]
> http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>
[3]
https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
>>
>>
>> -boris
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@...ts.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>
>
Thank you,
Oleksandr
Powered by blists - more mailing lists