[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH76GKO9sxnuLM--x6sg7m3bC_NgvLA94N6jHA-+5gW741-ByQ@mail.gmail.com>
Date: Wed, 20 Jul 2022 15:15:47 +0200
From: Grzegorz Jaszczyk <jaz@...ihalf.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Dmytro Maluka <dmy@...ihalf.com>,
Mario Limonciello <mario.limonciello@....com>,
Sean Christopherson <seanjc@...gle.com>,
Dominik Behr <dbehr@...gle.com>, upstream@...ihalf.com,
Zide Chen <zide.chen@...el.corp-partner.google.com>,
Len Brown <lenb@...nel.org>,
Hans de Goede <hdegoede@...hat.com>,
Mark Gross <markgross@...nel.org>, Pavel Machek <pavel@....cz>,
Mika Westerberg <mika.westerberg@...ux.intel.com>,
Sachi King <nakato@...ato.io>,
"open list:ACPI" <linux-acpi@...r.kernel.org>,
"open list:X86 PLATFORM DRIVERS"
<platform-driver-x86@...r.kernel.org>,
"open list:HIBERNATION (aka Software Suspend, aka swsusp)"
<linux-pm@...r.kernel.org>
Subject: Re: [RFC PATCH 1/2] suspend: extend S2Idle ops by new notify handler
wt., 19 lip 2022 o 20:09 Rafael J. Wysocki <rafael@...nel.org> napisaĆ(a):
>
> On Thu, Jul 7, 2022 at 2:56 PM Grzegorz Jaszczyk <jaz@...ihalf.com> wrote:
> >
> > Currently the LPS0 prepare_late callback is aimed to run as the very
> > last thing before entering the S2Idle state from LPS0 perspective,
> > nevertheless between this call and the system actually entering the
> > S2Idle state there are several places where the suspension process could
> > be canceled.
>
> And why is this a problem?
>
> The cancellation will occur only if there is a wakeup signal that
> would otherwise cause one of the CPUs to exit the idle state. Such a
> wakeup signal can appear after calling the new notifier as well, so
> why does it make a difference?
It could also occur due to suspend_test. Additionally with new
notifier we could get notification when the system wakes up from
s2idle_loop and immediately goes to sleep again (due to e.g.
acpi_s2idle_wake condition not being met) - in this case relying on
prepare_late callback is not possible since it is not called in this
path.
>
> > In order to notify VMM about guest entering suspend, extend the S2Idle
> > ops by new notify callback, which will be really invoked as a very last
> > thing before guest actually enters S2Idle state.
>
> It is not guaranteed that "suspend" (defined as all CPUs entering idle
> states) will be actually entered even after this "last step".
Since this whole patchset is aimed at notifying the host about a guest
entering s2idle state, reaching this step can be considered as a
suspend "entry point" for VM IMO. It is because we are talking about
the vCPU not the real CPU. Therefore it seems to me, that even if some
other vCPUs could still get some wakeup signal they will not be able
to kick (through s2idle_wake->swake_up_one(&s2idle_wait_head);) the
original vCPU which entered s2idle_loop, triggered the new notifier
and is halted due to handling vCPU exit (and was about to trigger
swait_event_exclusive). So it will prevent the VM's resume process
from being started.
>
> > Additionally extend the acpi_s2idle_dev_ops by notify() callback so
> > any driver can hook into it and allow to implement its own notification.
> >
> > Taking advantage of e.g. existing acpi_s2idle_dev_ops's prepare/restore
> > hooks is not an option since it will not allow to prevent race
> > conditions:
> > - VM0 enters s2idle
> > - host notes about VM0 is in s2idle
> > - host continues with system suspension but in the meantime VM0 exits
> > s2idle and sends notification but it is already too late (VM could not
> > even send notification on time).
>
> Too late for what?
Too late to cancel the host suspend process, which thinks that the VM
is in s2idle state while it isn't.
>
> > Introducing notify() as a very last step before the system enters S2Idle
> > together with an assumption that the VMM has control over guest
> > resumption allows preventing mentioned races.
>
> How does it do that?
At the moment when VM triggers this new notifier we trap on MMIO
access and the VMM handles vCPU exit (so the vCPU is "halted").
Therefore the VMM could control when it finishes such handling and
releases the vCPU again.
Maybe adding some more context will be helpful. This patchset was
aimed for two different scenarios actually:
1) Host is about to enter the suspend state and needs first to suspend
VM with all pass-through devices. In this case the host waits for
s2idle notification from the guest and when it receives it, it
continues with its own suspend process.
2) Guest could be a "privileged" one (in terms of VMM) and when the
guest enters s2idle state it notifies the host, which in turn triggers
the suspend process of the host.
>
> It looks like you want suspend-to-idle to behave like S3 and it won't.
In a way, yes, we compensate for the lack of something like PM1_CNT to
trap on for detecting that the guest is suspending.
We could instead force the guest to use S3 but IMO it is undesirable,
since it generally does make a difference which suspend mode is used
in the guest, s2idle or S3, e.g some drivers check which suspend type
is used and based on that behaves differently during suspend. One of
the example is:
https://elixir.bootlin.com/linux/v5.18.12/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2323
https://elixir.bootlin.com/linux/v5.18.12/source/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c#L1069
https://elixir.bootlin.com/linux/v5.18.12/source/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c#L583
Thank you,
Grzegorz
>
> > Signed-off-by: Grzegorz Jaszczyk <jaz@...ihalf.com>
> > ---
> > drivers/acpi/x86/s2idle.c | 11 +++++++++++
> > include/linux/acpi.h | 1 +
> > include/linux/suspend.h | 1 +
> > kernel/power/suspend.c | 4 ++++
> > 4 files changed, 17 insertions(+)
> >
> > diff --git a/drivers/acpi/x86/s2idle.c b/drivers/acpi/x86/s2idle.c
> > index 2963229062f8..d5aff194c736 100644
> > --- a/drivers/acpi/x86/s2idle.c
> > +++ b/drivers/acpi/x86/s2idle.c
> > @@ -520,10 +520,21 @@ void acpi_s2idle_restore_early(void)
> > lps0_dsm_func_mask, lps0_dsm_guid);
> > }
> >
> > +static void acpi_s2idle_notify(void)
> > +{
> > + struct acpi_s2idle_dev_ops *handler;
> > +
> > + list_for_each_entry(handler, &lps0_s2idle_devops_head, list_node) {
> > + if (handler->notify)
> > + handler->notify();
> > + }
> > +}
> > +
> > static const struct platform_s2idle_ops acpi_s2idle_ops_lps0 = {
> > .begin = acpi_s2idle_begin,
> > .prepare = acpi_s2idle_prepare,
> > .prepare_late = acpi_s2idle_prepare_late,
> > + .notify = acpi_s2idle_notify,
> > .wake = acpi_s2idle_wake,
> > .restore_early = acpi_s2idle_restore_early,
> > .restore = acpi_s2idle_restore,
> > diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> > index 4f82a5bc6d98..b32c4baed99b 100644
> > --- a/include/linux/acpi.h
> > +++ b/include/linux/acpi.h
> > @@ -1068,6 +1068,7 @@ struct acpi_s2idle_dev_ops {
> > struct list_head list_node;
> > void (*prepare)(void);
> > void (*restore)(void);
> > + void (*notify)(void);
> > };
> > int acpi_register_lps0_dev(struct acpi_s2idle_dev_ops *arg);
> > void acpi_unregister_lps0_dev(struct acpi_s2idle_dev_ops *arg);
> > diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> > index 70f2921e2e70..16ef7f9d9a03 100644
> > --- a/include/linux/suspend.h
> > +++ b/include/linux/suspend.h
> > @@ -191,6 +191,7 @@ struct platform_s2idle_ops {
> > int (*begin)(void);
> > int (*prepare)(void);
> > int (*prepare_late)(void);
> > + void (*notify)(void);
> > bool (*wake)(void);
> > void (*restore_early)(void);
> > void (*restore)(void);
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 827075944d28..6ba211b94ed1 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -100,6 +100,10 @@ static void s2idle_enter(void)
> >
> > /* Push all the CPUs into the idle loop. */
> > wake_up_all_idle_cpus();
> > +
> > + if (s2idle_ops && s2idle_ops->notify)
> > + s2idle_ops->notify();
> > +
> > /* Make the current CPU wait so it can enter the idle loop too. */
> > swait_event_exclusive(s2idle_wait_head,
> > s2idle_state == S2IDLE_STATE_WAKE);
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >
Powered by blists - more mailing lists