lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190514164503.GA1668@linux.intel.com>
Date:   Tue, 14 May 2019 09:45:03 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Wanpeng Li <kernellwp@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, kvm <kvm@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Liran Alon <liran.alon@...cle.com>
Subject: Re: [PATCH 3/3] KVM: LAPIC: Optimize timer latency further

On Tue, May 14, 2019 at 06:56:04PM +0800, Wanpeng Li wrote:
> On Tue, 14 May 2019 at 09:45, Wanpeng Li <kernellwp@...il.com> wrote:
> >
> > On Tue, 14 May 2019 at 03:54, Sean Christopherson
> > <sean.j.christopherson@...el.com> wrote:
> > > Rather than reinvent the wheel, can we simply move the call to
> > > wait_lapic_expire() into vmx.c and svm.c?  For VMX we'd probably want to
> > > support the advancement if enable_unrestricted_guest=true so that we avoid
> > > the emulation_required case, but other than that I don't see anything that
> > > requires wait_lapic_expire() to be called where it is.
> >
> > I also considered to move wait_lapic_expire() into vmx.c and svm.c
> > before, what do you think, Paolo, Radim?
> 
> However, guest_enter_irqoff() also prevents this. Otherwise, we will
> account busy wait time as guest time. How about sampling several times
> and get the average value or conservative min value to handle Sean's
> concern?

Hmm, looking at the history, wait_lapic_expire() was originally called
immediately before kvm_x86_ops->run()[1].  The call was moved above
guest_enter_irqoff() because of its tracepoint, which violated the RCU
extended quiescent state invoked by guest_enter_irqoff()[2][3].  In
other words, I don't think there is a fundamental issue with accounting
the busy wait time to the guest rather than the host.

Assuming the tracepoint was added to help tune the advancement time, I
think we can simply remove the tracepoint, which would allow moving
wait_lapic_expire().  Now that the advancement time is tracked per-vCPU,
realizing a change in the advancement time requires creating a new VM.
For all intents and purposes this makes it impractical to hand tune the
advancement in real time using the tracepoint as the feedback mechanism.

If we want to expose the per-vCPU advancement time to the user, a debugfs
entry is likely sufficient given that the advancement time is
automatically adjusted.

[1] Commit d0659d946be0 ("KVM: x86: add option to advance tscdeadline hrtimer expiration")
[2] Commit 8b89fe1f6c43 ("kvm: x86: move tracepoints outside extended quiescent state")
[3] https://patchwork.kernel.org/patch/7821111/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ