linux-kernel - Re: [RFC PATCH 1/2] suspend: extend S2Idle ops by new notify handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH76GKMw2rAnQOSFqReG1sEC=sdncWOJHNXn-Rp2Gx1oUZR3ZQ@mail.gmail.com>
Date:   Mon, 22 Aug 2022 11:26:23 +0200
From:   Grzegorz Jaszczyk <jaz@...ihalf.com>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Dmytro Maluka <dmy@...ihalf.com>,
        Mario Limonciello <mario.limonciello@....com>,
        Sean Christopherson <seanjc@...gle.com>,
        Dominik Behr <dbehr@...gle.com>, upstream@...ihalf.com,
        Zide Chen <zide.chen@...el.corp-partner.google.com>,
        Len Brown <lenb@...nel.org>,
        Hans de Goede <hdegoede@...hat.com>,
        Mark Gross <markgross@...nel.org>, Pavel Machek <pavel@....cz>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        Sachi King <nakato@...ato.io>,
        "open list:ACPI" <linux-acpi@...r.kernel.org>,
        "open list:X86 PLATFORM DRIVERS" 
        <platform-driver-x86@...r.kernel.org>,
        "open list:HIBERNATION (aka Software Suspend, aka swsusp)" 
        <linux-pm@...r.kernel.org>
Subject: Re: [RFC PATCH 1/2] suspend: extend S2Idle ops by new notify handler

Hi Rafael,

Could you please kindly comment on the above?

Thank you in advance,
Grzegorz

śr., 20 lip 2022 o 15:15 Grzegorz Jaszczyk <jaz@...ihalf.com> napisał(a):
>
> wt., 19 lip 2022 o 20:09 Rafael J. Wysocki <rafael@...nel.org> napisał(a):
> >
> > On Thu, Jul 7, 2022 at 2:56 PM Grzegorz Jaszczyk <jaz@...ihalf.com> wrote:
> > >
> > > Currently the LPS0 prepare_late callback is aimed to run as the very
> > > last thing before entering the S2Idle state from LPS0 perspective,
> > > nevertheless between this call and the system actually entering the
> > > S2Idle state there are several places where the suspension process could
> > > be canceled.
> >
> > And why is this a problem?
> >
> > The cancellation will occur only if there is a wakeup signal that
> > would otherwise cause one of the CPUs to exit the idle state.  Such a
> > wakeup signal can appear after calling the new notifier as well, so
> > why does it make a difference?
>
> It could also occur due to suspend_test. Additionally with new
> notifier we could get notification when the system wakes up from
> s2idle_loop and immediately goes to sleep again (due to e.g.
> acpi_s2idle_wake condition not being met) - in this case relying on
> prepare_late callback is not possible since it is not called in this
> path.
>
> >
> > > In order to notify VMM about guest entering suspend, extend the S2Idle
> > > ops by new notify callback, which will be really invoked as a very last
> > > thing before guest actually enters S2Idle state.
> >
> > It is not guaranteed that "suspend" (defined as all CPUs entering idle
> > states) will be actually entered even after this "last step".
>
> Since this whole patchset is aimed at notifying the host about a guest
> entering s2idle state, reaching this step can be considered as a
> suspend "entry point" for VM IMO. It is because we are talking about
> the vCPU not the real CPU. Therefore it seems to me, that even if some
> other vCPUs could still get some wakeup signal they will not be able
> to kick (through s2idle_wake->swake_up_one(&s2idle_wait_head);) the
> original vCPU which entered s2idle_loop, triggered the new notifier
> and is halted due to handling vCPU exit (and was about to trigger
> swait_event_exclusive). So it will prevent the VM's resume process
> from being started.
>
> >
> > > Additionally extend the acpi_s2idle_dev_ops by notify() callback so
> > > any driver can hook into it and allow to implement its own notification.
> > >
> > > Taking advantage of e.g. existing acpi_s2idle_dev_ops's prepare/restore
> > > hooks is not an option since it will not allow to prevent race
> > > conditions:
> > > - VM0 enters s2idle
> > > - host notes about VM0 is in s2idle
> > > - host continues with system suspension but in the meantime VM0 exits
> > > s2idle and sends notification but it is already too late (VM could not
> > > even send notification on time).
> >
> > Too late for what?
>
> Too late to cancel the host suspend process, which thinks that the VM
> is in s2idle state while it isn't.
>
> >
> > > Introducing notify() as a very last step before the system enters S2Idle
> > > together with an assumption that the VMM has control over guest
> > > resumption allows preventing mentioned races.
> >
> > How does it do that?
>
> At the moment when VM triggers this new notifier we trap on MMIO
> access and the VMM handles vCPU exit (so the vCPU is "halted").
> Therefore the VMM could control when it finishes such handling and
> releases the vCPU again.
>
> Maybe adding some more context will be helpful. This patchset was
> aimed for two different scenarios actually:
> 1) Host is about to enter the suspend state and needs first to suspend
> VM with all pass-through devices. In this case the host waits for
> s2idle notification from the guest and when it receives it, it
> continues with its own suspend process.
> 2) Guest could be a "privileged" one (in terms of VMM) and when the
> guest enters s2idle state it notifies the host, which in turn triggers
> the suspend process of the host.
>
> >
> > It looks like you want suspend-to-idle to behave like S3 and it won't.
>
> In a way, yes, we compensate for the lack of something like PM1_CNT to
> trap on for detecting that the guest is suspending.
> We could instead force the guest to use S3 but IMO it is undesirable,
> since it generally does make a difference which suspend mode is used
> in the guest, s2idle or S3, e.g some drivers check which suspend type
> is used and based on that behaves differently during suspend. One of
> the example is:
> https://elixir.bootlin.com/linux/v5.18.12/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2323
> https://elixir.bootlin.com/linux/v5.18.12/source/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c#L1069
> https://elixir.bootlin.com/linux/v5.18.12/source/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c#L583
>
> Thank you,
> Grzegorz