linux-kernel - Re: [PATCH 3/3] KVM: LAPIC: Optimize timer latency further

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20190514164503.GA1668@linux.intel.com>
Date:   Tue, 14 May 2019 09:45:03 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Wanpeng Li <kernellwp@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, kvm <kvm@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Liran Alon <liran.alon@...cle.com>
Subject: Re: [PATCH 3/3] KVM: LAPIC: Optimize timer latency further

On Tue, May 14, 2019 at 06:56:04PM +0800, Wanpeng Li wrote:
> On Tue, 14 May 2019 at 09:45, Wanpeng Li <kernellwp@...il.com> wrote:
> >
> > On Tue, 14 May 2019 at 03:54, Sean Christopherson
> > <sean.j.christopherson@...el.com> wrote:
> > > Rather than reinvent the wheel, can we simply move the call to
> > > wait_lapic_expire() into vmx.c and svm.c?  For VMX we'd probably want to
> > > support the advancement if enable_unrestricted_guest=true so that we avoid
> > > the emulation_required case, but other than that I don't see anything that
> > > requires wait_lapic_expire() to be called where it is.
> >
> > I also considered to move wait_lapic_expire() into vmx.c and svm.c
> > before, what do you think, Paolo, Radim?
> 
> However, guest_enter_irqoff() also prevents this. Otherwise, we will
> account busy wait time as guest time. How about sampling several times
> and get the average value or conservative min value to handle Sean's
> concern?

Hmm, looking at the history, wait_lapic_expire() was originally called
immediately before kvm_x86_ops->run()[1].  The call was moved above
guest_enter_irqoff() because of its tracepoint, which violated the RCU
extended quiescent state invoked by guest_enter_irqoff()[2][3].  In
other words, I don't think there is a fundamental issue with accounting
the busy wait time to the guest rather than the host.

Assuming the tracepoint was added to help tune the advancement time, I
think we can simply remove the tracepoint, which would allow moving
wait_lapic_expire().  Now that the advancement time is tracked per-vCPU,
realizing a change in the advancement time requires creating a new VM.
For all intents and purposes this makes it impractical to hand tune the
advancement in real time using the tracepoint as the feedback mechanism.

If we want to expose the per-vCPU advancement time to the user, a debugfs
entry is likely sufficient given that the advancement time is
automatically adjusted.

[1] Commit d0659d946be0 ("KVM: x86: add option to advance tscdeadline hrtimer expiration")
[2] Commit 8b89fe1f6c43 ("kvm: x86: move tracepoints outside extended quiescent state")
[3] https://patchwork.kernel.org/patch/7821111/