[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d6330f93-dfc4-91fc-3e5f-7be93b1ce2cb@amd.com>
Date: Thu, 23 Jun 2022 12:19:19 -0500
From: "Limonciello, Mario" <mario.limonciello@....com>
To: Grzegorz Jaszczyk <jaz@...ihalf.com>,
Sean Christopherson <seanjc@...gle.com>
Cc: linux-kernel@...r.kernel.org, Dmytro Maluka <dmy@...ihalf.com>,
Zide Chen <zide.chen@...el.corp-partner.google.com>,
Peter Fang <peter.fang@...el.corp-partner.google.com>,
Tomasz Nowicki <tn@...ihalf.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Jonathan Corbet <corbet@....net>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Len Brown <lenb@...nel.org>, Pavel Machek <pavel@....cz>,
Ashish Kalra <ashish.kalra@....com>,
Hans de Goede <hdegoede@...hat.com>,
Sachi King <nakato@...ato.io>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
David Dunn <daviddunn@...gle.com>,
Wei Wang <wei.w.wang@...el.com>,
Nicholas Piggin <npiggin@...il.com>,
"open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@...r.kernel.org>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
"open list:ACPI" <linux-acpi@...r.kernel.org>,
"open list:HIBERNATION (aka Software Suspend, aka swsusp)"
<linux-pm@...r.kernel.org>, Dominik Behr <dbehr@...gle.com>,
Dmitry Torokhov <dtor@...gle.com>
Subject: Re: [PATCH 1/2] x86: notify hypervisor about guest entering s2idle
state
On 6/23/2022 11:50, Grzegorz Jaszczyk wrote:
> śr., 22 cze 2022 o 23:50 Limonciello, Mario
> <mario.limonciello@....com> napisał(a):
>>
>> On 6/22/2022 04:53, Grzegorz Jaszczyk wrote:
>>> pon., 20 cze 2022 o 18:32 Limonciello, Mario
>>> <mario.limonciello@....com> napisał(a):
>>>>
>>>> On 6/20/2022 10:43, Grzegorz Jaszczyk wrote:
>>>>> czw., 16 cze 2022 o 18:58 Limonciello, Mario
>>>>> <mario.limonciello@....com> napisał(a):
>>>>>>
>>>>>> On 6/16/2022 11:48, Sean Christopherson wrote:
>>>>>>> On Wed, Jun 15, 2022, Grzegorz Jaszczyk wrote:
>>>>>>>> pt., 10 cze 2022 o 16:30 Sean Christopherson <seanjc@...gle.com> napisał(a):
>>>>>>>>> MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
>>>>>>>>> enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
>>>>>>>>> function? E.g. something like
>>>>>>>>>
>>>>>>>>> static void s2idle_hypervisor_notify(void)
>>>>>>>>> {
>>>>>>>>> if (lps0_dsm_func_mask > 0)
>>>>>>>>> acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
>>>>>>>>> lps0_dsm_func_mask, lps0_dsm_guid);
>>>>>>>>> }
>>>>>>>>
>>>>>>>> Great, thank you for your suggestion! I will try this approach and
>>>>>>>> come back. Since this will be the main change in the next version,
>>>>>>>> will it be ok for you to add Suggested-by: Sean Christopherson
>>>>>>>> <seanjc@...gle.com> tag?
>>>>>>>
>>>>>>> If you want, but there's certainly no need to do so. But I assume you or someone
>>>>>>> at Intel will need to get formal approval for adding another ACPI LPS0 function?
>>>>>>> I.e. isn't there work to be done outside of the kernel before any patches can be
>>>>>>> merged?
>>>>>>
>>>>>> There are 3 different LPS0 GUIDs in use. An Intel one, an AMD (legacy)
>>>>>> one, and a Microsoft one. They all have their own specs, and so if this
>>>>>> was to be added I think all 3 need to be updated.
>>>>>
>>>>> Yes this will not be easy to achieve I think.
>>>>>
>>>>>>
>>>>>> As this is Linux specific hypervisor behavior, I don't know you would be
>>>>>> able to convince Microsoft to update theirs' either.
>>>>>>
>>>>>> How about using s2idle_devops? There is a prepare() call and a
>>>>>> restore() call that is set for each handler. The only consumer of this
>>>>>> ATM I'm aware of is the amd-pmc driver, but it's done like a
>>>>>> notification chain so that a bunch of drivers can hook in if they need to.
>>>>>>
>>>>>> Then you can have this notification path and the associated ACPI device
>>>>>> it calls out to be it's own driver.
>>>>>
>>>>> Thank you for your suggestion, just to be sure that I've understand
>>>>> your idea correctly:
>>>>> 1) it will require to extend acpi_s2idle_dev_ops about something like
>>>>> hypervisor_notify() call, since existing prepare() is called from end
>>>>> of acpi_s2idle_prepare_late so it is too early as it was described in
>>>>> one of previous message (between acpi_s2idle_prepare_late and place
>>>>> where we use hypercall there are several places where the suspend
>>>>> could be canceled, otherwise we could probably try to trap on other
>>>>> acpi_sleep_run_lps0_dsm occurrence from acpi_s2idle_prepare_late).
>>>>>
>>>>
>>>> The idea for prepare() was it would be the absolute last thing before
>>>> the s2idle loop was run. You're sure that's too early? It's basically
>>>> the same thing as having a last stage new _DSM call.
>>>>
>>>> What about adding a new abort() extension to acpi_s2idle_dev_ops? Then
>>>> you could catch the cancelled suspend case still and take corrective
>>>> action (if that action is different than what restore() would do).
>>>
>>> It will be problematic since the abort/restore notification could
>>> arrive too late and therefore the whole system will go to suspend
>>> thinking that the guest is in desired s2ilde state. Also in this case
>>> it would be impossible to prevent races and actually making sure that
>>> the guest is suspended or not. We already had similar discussion with
>>> Sean earlier in this thread why the notification have to be send just
>>> before swait_event_exclusive(s2idle_wait_head, s2idle_state ==
>>> S2IDLE_STATE_WAKE) and that the VMM have to have control over guest
>>> resumption.
>>>
>>> Nevertheless if extending acpi_s2idle_dev_ops is possible, why not
>>> extend it about the hypervisor_notify() and use it in the same place
>>> where the hypercall is used in this patch? Do you see any issue with
>>> that?
>>
>> If this needs to be a hypercall and the hypercall needs to go at that
>> specific time, I wouldn't bother with extending acpi_s2idle_dev_ops.
>> The whole idea there was that this would be less custom and could follow
>> a spec.
>
> Just to clarify - it probably doesn't need to be a hypercall. I've
> probably misled you with copy-pasting a handler name from the current
> patch but aiming your and Sean ACPI like approach.
Ah... Yeah I was quite confused.
> What I meant is
> something like:
> - extend acpi_s2idle_dev_ops with notify()
> - implement notify() handler for acpi_s2idle_dev_ops in HYPE0001
> driver (without hypercall):
> static void s2idle_notify(void)
> {
> acpi_evaluate_dsm(acpi_handle, guid_of_HYPE0001, 0,
> ACPI_HYPE_NOTIFY, NULL);
> }
>
> - register it via acpi_register_lps0_dev() from HYPE0001 driver
> - use it just before swait_event_exclusive(s2idle_wait_head..) as it
> is with original patch (the name of the function will be different):
> static void s2idle_hypervisor_notify(void)
> {
> struct acpi_s2idle_dev_ops *handler;
> ...
> list_for_each_entry(handler, &lps0_s2idle_devops_head, list_node) {
> if (handler->notify)
> handler->notify();
> }
> }
>
> so it will be like:
> -> s2idle_enter (just before swait_event_exclusive(s2idle_wait_head,.. )
> --> s2idle_hypervisor_notify (as platform_s2idle_ops)
> ---> notify (as acpi_s2idle_dev_ops)
> ----> HYPE0001 device driver's notify () routine
>
> It will probably be easier to understand it if I actually implement
> it.
Yeah; A lot of times seeing the mocked up code makes it easier to follow.
> Nevertheless this way we ensure that:
> - notification will be triggered at very last command before actually
> entering s2idle
> - we can trap on MMIO/PIO by implementing HYPE0001 specific _DSM
> method and therefore this implementation will not become hypervisor
> specific and also not use KVM as "dumb pipe out to userspace" as Sean
> suggested
> - we will not have to change existing Intel/AMD/Window spec (3
> different LPS0 GUIDs) but thanks to HYPE0001's acpi_s2idle_dev_ops
> involvment, only care about new HYPE0001 spec
>
I think your proposal is reasonable. Please include me on the RFC when
you've got it ready as well.
>>
>> TBH - given the strong dependency on being the very last command and
>> this being all Linux specific (you won't need to do something similar
>> with Windows) - I think the way you already did it makes the most sense.
>> It seems to me the ACPI device model doesn't really work well for this
>> scenario.
>>
>>>
>>>>
>>>>> 2) using newly introduced acpi_s2idle_dev_ops hypervisor_notify() call
>>>>> will allow to register handler from Intel x86/intel/pmc/core.c driver
>>>>> and/or AMD x86/amd-pmc.c driver. Therefore we will need to get only
>>>>> Intel and/or AMD approval about extending the ACPI LPS0 _DSM method,
>>>>> correct?
>>>>>
>>>>
>>>> Right now the only thing that hooks prepare()/restore() is the amd-pmc
>>>> driver (unless Intel's PMC had a change I didn't catch yet).
>>>>
>>>> I don't think you should be changing any existing drivers but rather
>>>> introduce another platform driver for this specific case.
>>>>
>>>> So it would be something like this:
>>>>
>>>> acpi_s2idle_prepare_late
>>>> -> prepare()
>>>> --> AMD: amd_pmc handler for prepare()
>>>> --> Intel: intel_pmc handler for prepare() (conceptual)
>>>> --> HYPE0001 device: new driver's prepare() routine
>>>>
>>>> So the platform driver would match the HYPE0001 device to load, and it
>>>> wouldn't do anything other than provide a prepare()/restore() handler
>>>> for your case.
>>>>
>>>> You don't need to change any existing specs. If anything a new spec to
>>>> go with this new ACPI device would be made. Someone would need to
>>>> reserve the ID and such for it, but I think you can mock it up in advance.
>>>
>>> Thank you for your explanation. This means that I should register
>>> "HYPE" through https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuefi.org%2FPNP_ACPI_Registry&data=05%7C01%7Cmario.limonciello%40amd.com%7Cfb93455738b84f772c0508da553878b6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637915998363689041%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jE1agna7RsjTW7%2BTp5UVFxByOPYURlNa79eyJxcKi2o%3D&reserved=0 before introducing
>>> this new driver to Linux.
>>> I have no experience with the above, so I wonder who should be
>>> responsible for maintaining such ACPI ID since it will not belong to
>>> any specific vendor? There is an example of e.g. COREBOOT PROJECT
>>> using "BOOT" ACPI ID [1], which seems similar in terms of not
>>> specifying any vendor but rather the project as a responsible entity.
>>> Maybe you have some recommendations?
>>
>> Maybe LF could own a namespace and ID? But I would suggest you make a
>> mockup that everything works this way before you go explore too much.
>
> Yeah, sure.
>
>>
>> Also make sure Rafael is aligned with your mockup.
>
> Agree.
>
>>
>>>
>>> I am also not sure if and where a specification describing such a
>>> device has to be maintained. Since "HYPE0001" will have its own _DSM
>>> so will it be required to document it somewhere rather than just using
>>> it in the driver and preparing proper ACPI tables for guest?
>>>
>>>>
>>>>> I wonder if this will be affordable so just re-thinking loudly if
>>>>> there is no other mechanism that could be suggested and used upstream
>>>>> so we could notify hypervisor/vmm about guest entering s2idle state?
>>>>> Especially that such _DSM function will be introduced only to trap on
>>>>> some fake MMIO/PIO access and will be useful only for guest ACPI
>>>>> tables?
>>>>>
>>>>
>>>> Do you need to worry about Microsoft guests using Modern Standby too or
>>>> is that out of the scope of your problem set? I think you'll be a lot
>>>> more limited in how this can behave and where you can modify things if so.
>>>>
>>>
>>> I do not need to worry about Microsoft guests.
>>
>> Makes life a lot easier :)
>
> Agree :) and thank you for all your feedback,
> Grzegorz
Sure.
Powered by blists - more mailing lists