linux-kernel - Re: [RFC PATCH 1/8] kvm: x86: MSR for setting up scheduler info shared memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO7JXPg9wN3SQOciAjVTn6fdgdpKA0CjaYg5UvXosYHT=1CeuA@mail.gmail.com>
Date: Thu, 14 Dec 2023 14:53:55 -0500
From: Vineeth Remanan Pillai <vineeth@...byteword.org>
To: Vitaly Kuznetsov <vkuznets@...hat.com>
Cc: Suleiman Souhlal <suleiman@...gle.com>, Masami Hiramatsu <mhiramat@...gle.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, x86@...nel.org, 
	Joel Fernandes <joel@...lfernandes.org>, Ben Segall <bsegall@...gle.com>, 
	Borislav Petkov <bp@...en8.de>, Daniel Bristot de Oliveira <bristot@...hat.com>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	"H . Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>, 
	Mel Gorman <mgorman@...e.de>, Paolo Bonzini <pbonzini@...hat.com>, Andy Lutomirski <luto@...nel.org>, 
	Peter Zijlstra <peterz@...radead.org>, Sean Christopherson <seanjc@...gle.com>, 
	Steven Rostedt <rostedt@...dmis.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Valentin Schneider <vschneid@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Wanpeng Li <wanpengli@...cent.com>
Subject: Re: [RFC PATCH 1/8] kvm: x86: MSR for setting up scheduler info
 shared memory

On Thu, Dec 14, 2023 at 5:53 AM Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
>
> "Vineeth Pillai (Google)" <vineeth@...byteword.org> writes:
>
> > Implement a kvm MSR that guest uses to provide the GPA of shared memory
> > for communicating the scheduling information between host and guest.
> >
> > wrmsr(0) disables the feature. wrmsr(valid_gpa) enables the feature and
> > uses the gpa for further communication.
> >
> > Also add a new cpuid feature flag for the host to advertise the feature
> > to the guest.
> >
> > Co-developed-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> > Signed-off-by: Vineeth Pillai (Google) <vineeth@...byteword.org>
> > ---
> >  arch/x86/include/asm/kvm_host.h      | 25 ++++++++++++
> >  arch/x86/include/uapi/asm/kvm_para.h | 24 +++++++++++
> >  arch/x86/kvm/Kconfig                 | 12 ++++++
> >  arch/x86/kvm/cpuid.c                 |  2 +
> >  arch/x86/kvm/x86.c                   | 61 ++++++++++++++++++++++++++++
> >  include/linux/kvm_host.h             |  5 +++
> >  6 files changed, 129 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index f72b30d2238a..f89ba1f07d88 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -987,6 +987,18 @@ struct kvm_vcpu_arch {
> >       /* Protected Guests */
> >       bool guest_state_protected;
> >
> > +#ifdef CONFIG_PARAVIRT_SCHED_KVM
> > +     /*
> > +      * MSR to setup a shared memory for scheduling
> > +      * information sharing between host and guest.
> > +      */
> > +     struct {
> > +             enum kvm_vcpu_boost_state boost_status;
> > +             u64 msr_val;
> > +             struct gfn_to_hva_cache data;
> > +     } pv_sched;
> > +#endif
> > +
> >       /*
> >        * Set when PDPTS were loaded directly by the userspace without
> >        * reading the guest memory
> > @@ -2217,4 +2229,17 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
> >   */
> >  #define KVM_EXIT_HYPERCALL_MBZ               GENMASK_ULL(31, 1)
> >
> > +#ifdef CONFIG_PARAVIRT_SCHED_KVM
> > +static inline bool kvm_arch_vcpu_pv_sched_enabled(struct kvm_vcpu_arch *arch)
> > +{
> > +     return arch->pv_sched.msr_val;
> > +}
> > +
> > +static inline void kvm_arch_vcpu_set_boost_status(struct kvm_vcpu_arch *arch,
> > +             enum kvm_vcpu_boost_state boost_status)
> > +{
> > +     arch->pv_sched.boost_status = boost_status;
> > +}
> > +#endif
> > +
> >  #endif /* _ASM_X86_KVM_HOST_H */
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > index 6e64b27b2c1e..6b1dea07a563 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -36,6 +36,7 @@
> >  #define KVM_FEATURE_MSI_EXT_DEST_ID  15
> >  #define KVM_FEATURE_HC_MAP_GPA_RANGE 16
> >  #define KVM_FEATURE_MIGRATION_CONTROL        17
> > +#define KVM_FEATURE_PV_SCHED         18
> >
> >  #define KVM_HINTS_REALTIME      0
> >
> > @@ -58,6 +59,7 @@
> >  #define MSR_KVM_ASYNC_PF_INT 0x4b564d06
> >  #define MSR_KVM_ASYNC_PF_ACK 0x4b564d07
> >  #define MSR_KVM_MIGRATION_CONTROL    0x4b564d08
> > +#define MSR_KVM_PV_SCHED     0x4b564da0
> >
> >  struct kvm_steal_time {
> >       __u64 steal;
> > @@ -150,4 +152,26 @@ struct kvm_vcpu_pv_apf_data {
> >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> >  #define KVM_PV_EOI_DISABLED 0x0
> >
> > +/*
> > + * VCPU boost state shared between the host and guest.
> > + */
> > +enum kvm_vcpu_boost_state {
> > +     /* Priority boosting feature disabled in host */
> > +     VCPU_BOOST_DISABLED = 0,
> > +     /*
> > +      * vcpu is not explicitly boosted by the host.
> > +      * (Default priority when the guest started)
> > +      */
> > +     VCPU_BOOST_NORMAL,
> > +     /* vcpu is boosted by the host */
> > +     VCPU_BOOST_BOOSTED
> > +};
> > +
> > +/*
> > + * Structure passed in via MSR_KVM_PV_SCHED
> > + */
> > +struct pv_sched_data {
> > +     __u64 boost_status;
> > +};
> > +
> >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > index 89ca7f4c1464..dbcba73fb508 100644
> > --- a/arch/x86/kvm/Kconfig
> > +++ b/arch/x86/kvm/Kconfig
> > @@ -141,4 +141,16 @@ config KVM_XEN
> >  config KVM_EXTERNAL_WRITE_TRACKING
> >       bool
> >
> > +config PARAVIRT_SCHED_KVM
> > +     bool "Enable paravirt scheduling capability for kvm"
> > +     depends on KVM
> > +     help
> > +       Paravirtualized scheduling facilitates the exchange of scheduling
> > +       related information between the host and guest through shared memory,
> > +       enhancing the efficiency of vCPU thread scheduling by the hypervisor.
> > +       An illustrative use case involves dynamically boosting the priority of
> > +       a vCPU thread when the guest is executing a latency-sensitive workload
> > +       on that specific vCPU.
> > +       This config enables paravirt scheduling in the kvm hypervisor.
> > +
> >  endif # VIRTUALIZATION
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 7bdc66abfc92..960ef6e869f2 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -1113,6 +1113,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> >                            (1 << KVM_FEATURE_POLL_CONTROL) |
> >                            (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> >                            (1 << KVM_FEATURE_ASYNC_PF_INT);
> > +             if (IS_ENABLED(CONFIG_PARAVIRT_SCHED_KVM))
> > +                     entry->eax |= (1 << KVM_FEATURE_PV_SCHED);
> >
> >               if (sched_info_on())
> >                       entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 7bcf1a76a6ab..0f475b50ac83 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3879,6 +3879,33 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >                       return 1;
> >               break;
> >
> > +#ifdef CONFIG_PARAVIRT_SCHED_KVM
> > +     case MSR_KVM_PV_SCHED:
> > +             if (!guest_pv_has(vcpu, KVM_FEATURE_PV_SCHED))
> > +                     return 1;
> > +
> > +             if (!(data & KVM_MSR_ENABLED))
> > +                     break;
> > +
> > +             if (!(data & ~KVM_MSR_ENABLED)) {
> > +                     /*
> > +                      * Disable the feature
> > +                      */
> > +                     vcpu->arch.pv_sched.msr_val = 0;
> > +                     kvm_set_vcpu_boosted(vcpu, false);
> > +             } if (!kvm_gfn_to_hva_cache_init(vcpu->kvm,
> > +                             &vcpu->arch.pv_sched.data, data & ~KVM_MSR_ENABLED,
> > +                             sizeof(struct pv_sched_data))) {
> > +                     vcpu->arch.pv_sched.msr_val = data;
> > +                     kvm_set_vcpu_boosted(vcpu, false);
> > +             } else {
> > +                     pr_warn("MSR_KVM_PV_SCHED: kvm:%p, vcpu:%p, "
> > +                             "msr value: %llx, kvm_gfn_to_hva_cache_init failed!\n",
> > +                             vcpu->kvm, vcpu, data & ~KVM_MSR_ENABLED);
>
> As this is triggerable by the guest please drop this print (which is not
> even ratelimited!). I think it would be better to just 'return 1;' in case
> of kvm_gfn_to_hva_cache_init() failure but maybe you also need to
> account for 'msr_info->host_initiated' to not fail setting this MSR from
> the host upon migration.
>
Makes sense, shall remove the pr_warn.
I hadn't thought about migration, thanks for bringing this up. Will
make modifications to account for migration as well.

Thanks,
Vineeth