linux-kernel - Re: [PATCH] kvm: rename HINTS_DEDICATED to KVM_HINTS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180518171311.GB25013@localhost.localdomain>
Date:   Fri, 18 May 2018 14:13:11 -0300
From:   Eduardo Habkost <ehabkost@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     linux-kernel@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
        kvm@...r.kernel.org, linux-doc@...r.kernel.org,
        qemu-devel@...gnu.org
Subject: Re: [PATCH] kvm: rename HINTS_DEDICATED to KVM_HINTS_REALTIME

On Fri, May 18, 2018 at 08:01:49PM +0300, Michael S. Tsirkin wrote:
> On Fri, May 18, 2018 at 01:04:31PM -0300, Eduardo Habkost wrote:
> > CCing qemu-devel, as I'm now discussing userspace.
> > 
> > On Thu, May 17, 2018 at 10:55:33PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, May 17, 2018 at 03:46:58PM -0300, Eduardo Habkost wrote:
> > > > On Thu, May 17, 2018 at 05:54:24PM +0300, Michael S. Tsirkin wrote:
> > > > > HINTS_DEDICATED seems to be somewhat confusing:
> > > > > 
> > > > > Guest doesn't really care whether it's the only task running on a host
> > > > > CPU as long as it's not preempted.
> > > > > 
> > > > > And there are more reasons for Guest to be preempted than host CPU
> > > > > sharing, for example, with memory overcommit it can get preempted on a
> > > > > memory access, post copy migration can cause preemption, etc.
> > > > > 
> > > > > Let's call it KVM_HINTS_REALTIME which seems to better
> > > > > match what guests expect.
> > > > > 
> > > > > Also, the flag most be set on all vCPUs - current guests assume th.
> > > > > Note so in the documentation.
> > > > > 
> > > > > Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> > > > > ---
> > > > >  Documentation/virtual/kvm/cpuid.txt  | 6 +++---
> > > > >  arch/x86/include/uapi/asm/kvm_para.h | 2 +-
> > > > >  arch/x86/kernel/kvm.c                | 8 ++++----
> > > > >  3 files changed, 8 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> > > > > index d4f33eb8..ab022dc 100644
> > > > > --- a/Documentation/virtual/kvm/cpuid.txt
> > > > > +++ b/Documentation/virtual/kvm/cpuid.txt
> > > > > @@ -72,8 +72,8 @@ KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
> > > > >  
> > > > >  flag                               || value || meaning
> > > > >  ==================================================================================
> > > > > -KVM_HINTS_DEDICATED                ||     0 || guest checks this feature bit to
> > > > > -                                   ||       || determine if there is vCPU pinning
> > > > > -                                   ||       || and there is no vCPU over-commitment,
> > > > > +KVM_HINTS_REALTIME                 ||     0 || guest checks this feature bit to
> > > > > +                                   ||       || determine that vCPUs are never
> > > > > +                                   ||       || preempted for an unlimited time,
> > > > >                                     ||       || allowing optimizations
> > > > 
> > > > My understanding of the original patch is that the intention is
> > > > to tell the guest that it is very unlikely to be preempted,
> > > > so it
> > > > can choose a more appropriate spinlock implementation.  This
> > > > description implies that the guest will never be preempted, which
> > > > is much stronger guarantee.
> > > 
> > > Note:
> > > 
> > > ...  for an unlimited time.
> > 
> > Which still sounds like a stronger guarantee than the original
> > description.  But:
> > 
> > > 
> > > > 
> > > > Isn't this new description incompatible with existing usage of
> > > > the hint, which might include people who just use vCPU pinning
> > > > but no mlock?
> > > 
> > > Without mlock you should always use pv spinlocks.
> > > 
> > > Otherwise you risk blocking on a lock taken by
> > > a VCPU that is in turn blocked on IO, where the IO
> > > is not completing because CPU is being used up
> > > spinning.
> > 
> > So the stronger guarantee seems necessary.
> > 
> > Now what should host userspace do if the user is trying to run an
> > existing configuration where the CPUID hint was set but memory is
> > not pinned?
> 
> As much as we'd like to be helpful and validate input, you need a real
> time host too. I'm not sure how we'd find out - I suggest we do not
> bother for now.

I'm worried that people will start enabling the flag in all kinds
of scenarios where the guarantees can't be kept, and make the
meaning of the flag in practice completely different from its
documented meaning.

So I'd like to either detect cases where it's obviously wrong to
enable the flag, or document the requirements very clearly on
QEMU documentation.

-- 
Eduardo