lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH76GKNRDXAyGYvs2ji5Phu=5YPW8+SV8-6TLjizBRzTCnEROg@mail.gmail.com>
Date:   Fri, 10 Jun 2022 14:26:27 +0200
From:   Grzegorz Jaszczyk <jaz@...ihalf.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     linux-kernel@...r.kernel.org, Dmytro Maluka <dmy@...ihalf.com>,
        Zide Chen <zide.chen@...el.corp-partner.google.com>,
        Peter Fang <peter.fang@...el.corp-partner.google.com>,
        Tomasz Nowicki <tn@...ihalf.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Len Brown <lenb@...nel.org>, Pavel Machek <pavel@....cz>,
        Ashish Kalra <ashish.kalra@....com>,
        Mario Limonciello <mario.limonciello@....com>,
        Hans de Goede <hdegoede@...hat.com>,
        Sachi King <nakato@...ato.io>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        David Dunn <daviddunn@...gle.com>,
        Wei Wang <wei.w.wang@...el.com>,
        Nicholas Piggin <npiggin@...il.com>,
        "open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        "open list:ACPI" <linux-acpi@...r.kernel.org>,
        "open list:HIBERNATION (aka Software Suspend, aka swsusp)" 
        <linux-pm@...r.kernel.org>, dbehr@...gle.com, dtor@...gle.com
Subject: Re: [PATCH 1/2] x86: notify hypervisor about guest entering s2idle state

czw., 9 cze 2022 o 16:55 Sean Christopherson <seanjc@...gle.com> napisaƂ(a):
>
> On Thu, Jun 09, 2022, Grzegorz Jaszczyk wrote:
> > +9. KVM_HC_SYSTEM_S2IDLE
> > +------------------------
> > +
> > +:Architecture: x86
> > +:Status: active
> > +:Purpose: Notify the hypervisor that the guest is entering s2idle state.
>
> What about exiting s2idle?  E.g.
>
>   1. VM0 enters s2idle
>   2. host notes that VM0 is in s2idle
>   3. VM0 exits s2idle
>   4. host still thinks VM0 is in s2idle
>   5. VM1 enters s2idle
>   6. host thinks all VMs are in s2idle, suspends the system

I think that this problem couldn't be solved by adding notification
about exiting s2idle. Please consider (even after simplifying your
example to one VM):
1. VM0 enters s2idle
2. host notes about VM0 is in s2idle
3. host continues with system suspension but in the meantime VM0 exits
s2idle and sends notification but it is already too late (VM could not
even send notification on time).

Above could be actually prevented if the VMM had control over the
guest resumption. E.g. after VMM receives notification about guest
entering s2idle state, it would park the vCPU actually preventing it
from exiting s2idle without VMM intervention.

>
> > +static void s2idle_hypervisor_notify(void)
> > +{
> > +     if (static_cpu_has(X86_FEATURE_HYPERVISOR))
> > +             kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE);
>
> Checking the HYPERVISOR flag is not remotely sufficient.  The hypervisor may not
> be KVM, and if it is KVM, it may be an older version of KVM that doesn't support
> the hypercall.  The latter scenario won't be fatal unless KVM has been modified,
> but blindly doing a hypercall for a different hypervisor could have disastrous
> results, e.g. the registers ABIs are different, so the above will make a random
> request depending on what is in other GPRs.

Good point: we've actually thought about not confusing/breaking VMMs
so I've introduced KVM_CAP_X86_SYSTEM_S2IDLE VM capability in the
second patch, but not breaking different hypervisors is another story.
Would hiding it under new 's2idle_notify_kvm' module parameter work
for upstream?:

+static bool s2idle_notify_kvm __read_mostly;
+module_param(s2idle_notify_kvm, bool, 0644);
+MODULE_PARM_DESC(s2idle_notify_kvm, "Notify hypervisor about guest
entering s2idle state");
+
..
+static void s2idle_hypervisor_notify(void)
+{
+       if (static_cpu_has(X86_FEATURE_HYPERVISOR) &&
s2idle_notify_kvm)
+               kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE);
+}
+

>
> The bigger question is, why is KVM involved at all?  KVM is just a dumb pipe out
> to userspace, and not a very good one at that.  There are multiple well established
> ways to communicate with the VMM without custom hypercalls.

Could you please kindly advise about the recommended way of
communication with VMM, taking into account that we want to send this
notification just before entering s2idle state (please see also answer
to next comment), which is at a very late stage of the suspend process
with a lot of functionality already suspended?

>
>
> I bet if you're clever this can even be done without any guest changes, e.g. I
> gotta imagine acpi_sleep_run_lps0_dsm() triggers MMIO/PIO with the right ACPI
> configuration.

The problem is that between acpi_sleep_run_lps0_dsm and the place
where we introduced hypercall there are several places where we can
actually cancel and not enter the suspend state. So trapping on
acpi_sleep_run_lps0_dsm which triggers MMIO/PIO would be premature.

The other reason for doing it in this place is the fact that
s2idle_enter is called from an infinite loop inside s2idle_loop, which
could be interrupted by e.g. ACPI EC GPE (not aim for waking-up the
system) so s2idle_ops->wake() would return false and s2idle_enter will
be triggered again. In this case we would want to get notification
about guests actually entering s2idle state again, which wouldn't be
possible if we would rely on acpi_sleep_run_lps0_dsm.

Best regards,
Grzegorz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ