[<prev] [next>] [day] [month] [year] [list]
Message-ID: <f5e824de-da57-9574-3813-2668f2932a6e@gmail.com>
Date: Wed, 27 Mar 2019 08:40:20 +0200
From: Oleksandr Andrushchenko <andr2000@...il.com>
To: Anchal Agarwal <anchalag@...zon.com>
Cc: Munehisa Kamata <kamatam@...zon.com>,
"Oleksandr_Andrushchenko@...m.com" <Oleksandr_Andrushchenko@...m.com>,
Julien Grall <julien.grall@....com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jgross@...e.com" <jgross@...e.com>,
"sstabellini@...nel.org" <sstabellini@...nel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
Volodymyr Babchuk <Volodymyr_Babchuk@...m.com>,
"eduval@...zon.com" <eduval@...zon.com>
Subject: Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume
callback
On 3/25/19 7:30 PM, Anchal Agarwal wrote:
> On Fri, Mar 22, 2019 at 10:44:33AM +0000, Oleksandr Andrushchenko wrote:
>> On 3/20/19 5:50 AM, Munehisa Kamata wrote:
>>> On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
>>>> +Amazon
>>>> pls see inline
>>> Hi Oleksandr,
>>>
>>> Let me add some comments as the original author of the series.
>> Thank you for your work!
> Hi Oleksandr,
>>>> On 3/14/19 9:00 PM, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@...m.com>
>>>>>>>>>>>
>>>>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>>>>> This leads to the number of consequences:
>>>>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>>>>> cannot
>>>>>>>>>>> ???????? be cleanly done as the backend still holds those (it was not
>>>>>>>>>>> told to
>>>>>>>>>>> ???????? free the resources)
>>>>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>>>>> communication
>>>>>>>>>>> ???????? means with the backned were destroyed by the frontend, thus
>>>>>>>>>>> ???????? making the frontend appear to the guest OS as functional, but
>>>>>>>>>>> ???????? not really.
>>>>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>>>>> connectivity?
>>>>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>>>>> what it does it destroys the rings etc. and never notifies the backend
>>>>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>>>>> there is
>>>>>>>>> no way its state machine can help recovering.
>>>>>>>> My tree is about a month old so perhaps there is some sort of regression
>>>>>>>> but this certainly works for me. After resume netfront gets
>>>>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>>>>> Ah, the difference can be of the way we get the guest enter
>>>>>>> the suspend state. I am making my guest to suspend with:
>>>>>>> echo mem > /sys/power/state
>>>>>>> And then I use an interrupt to the guest (this is a test code)
>>>>>>> to wake it up.
>>>>>>> Could you please share your exact use-case when the guest enters suspend
>>>>>>> and what you do to resume it?
>>>>>> xl save / xl restore
>>>>>>
>>>>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>>>>> use-case
>>>>>>> as it simply doesn't know we want him to.
>>>>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>>>>> I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
>>>> Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
>>>>> Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
>>>>>
>>>>> I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
>>>>>
>>>> First of all, let me make it clear that we are interested in hibernation long term, so it would be
>>>> desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
>>>> step work, e.g. first S2RAM and later on hibernation.
>>>> Let me clarify the immediate use-case that we have, so it is easier to understand what we want
>>>> and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
>>>> Suspend-to-RAM for ARM [3] and we made number of assumptions:
>>>> 1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
>>>> of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
>>>> feature if you will.
>>>> 2. With suspend/resume there is no intention to migrate VMs to any other host.
>>>> 3. Most probably configuration of the back/front won't change between suspend/resume.
>>>> But long term we are also thinking for supporting suspend/resume in its broader meaning,
>>>> e.g. what is probably what you mean by suspend/resume.
>>> AFAIK .suspend and .resume callbacks in frontend drivers are
>>> specifically for xl save/restore case rather than the normal "system"
>>> suspend. i.e. The former is Boris' case and something I called "Xen
>>> suspend" in the patch series, the latter should be your interest and
>>> called "ACPI path" here, and I referred to as "PM suspend". They are
>>> very different code paths, see drivers/xen/manage.c for details of
>>> Xen suspend.
>> Yes, I saw that code, thank you
>>>> Given that, we think that we don't need Xen support to save grants, page tables and other
>>>> VM's context on suspend at least at the first stage as we are implementing not a fully
>>>> blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
>>>> suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
>>>> piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
>>>> we are not changing back/front configuration, we can even live with empty .resume/.suspend
>>>> frontend's callbacks because event channels, rings etc. are "statically" allocated in our
>>>> use-case at the first system start (cold boot). And indeed, tests show that waking domains
>>>> in the right order do allow that.
>>>> So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
>>> If you just (re)implement .suspend and .resume so without taking care
>>> of Xen suspend, you can easily break the existing functionality. The
>>> patch series introduced .freeze and .restore callbacks for both PM
>>> suspend and hibernation, and kept .suspend (not implemented in most
>>> frontend though) and .resume with no changes for Xen suspend.
>>>
>>> Note that xenbus has mapped freeze/thaw/restore events to suspend,
>>> resume and cancel callbacks to handle "checkpoint" case[4]. This was a
>>> bit tricky and led me to the design to have the separate set of
>>> callbacks at each frontend driver level[5]. You might need to consider
>>> a similar approach even if your immediate interest at the moment is PM
>>> suspend.
>> For the immediate task we have at the moment we think we can re-use
>> your work and implement .suspend/.resume based on it (we are targeting
>> S2RAM as the first stage).
>> But long term - we do support the idea of fully implemented
>> suspend and *hibernate* functionality as you describe it.
>> So, yes, we are also thinking about that.
>>>> even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
>>>> then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
>>>> can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
>>>> for now (remember our very specific use-case which can survive suspend without callbacks
>>>> implemented).
>>>> IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
>>>> be postponed for S2RAM.
>>>>
>>>> Julien/Juergen/Boris/Amazon - could you please express your view on the above?
>>>> Is it acceptable that for now we only take re-worked netfront patch from [3] with full
>>>> implementation in mind for later (we reuse code for .resume/.suspend)?
>>> In fact, Anchal has taken over my initial work and she may want to chime
>>> in here.
>> Great, could you please let us know what is the progress and further plans
>> on that, so we do not work on the same code and can coordinate our
>> efforts somehow? Anchal, could you please shed some light on this?
> Looks like my previous email did not make it to mailing list. May be some issues with my
> email server settings. Giving it another shot.
> Yes, I am working on those patches and plan to re-post them in an effort to upstream.
This is really great, looking forward to it: any date in your mind
when this can happen?
> I agree with Munehisa here on considering the patches that are already out there as
> I plan to keep the same model to distinguish PM SUSPEND and PM HIBERNATION from xen
> suspend and resume. There may be minor fixes here and there however, the overall
> idea will still remain the same.
Ok, so I'll plan my efforts accordingly
> As the previous patches there will be support for
> only xen-blkfront and xen-netfront in the initial patchset.
>>> That said, I'd be very happy to review patches if you come up with your
>>> own ones, so feel free to add me in that case.
>> Sure, thank you!
>>>>> Cheers,
>>>>>
>>>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>>>>>
>>>>> [2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>>>>>
>>>> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
>>> [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
>>> [5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
>>>
>>>>>> -boris
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@...ts.xenproject.org
>>>>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>>>>>
>>>> Thank you,
>>>> Oleksandr
>>> Thanks,
>>> Munehisa
> Thanks,
> Anchal
Thank you,
Oleksandr
Powered by blists - more mailing lists