[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CZAX9ZA1VRUZ.353NNERCBGKUU@amazon.com>
Date: Wed, 21 Feb 2024 17:11:09 +0000
From: Nicolas Saenz Julienne <nsaenz@...zon.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <frederic@...nel.org>, <paulmck@...nel.org>, <jalliste@...zon.co.uk>,
<mhiramat@...nel.org>, <akpm@...ux-foundation.org>, <pmladek@...e.com>,
<rdunlap@...radead.org>, <tsi@...oix.net>, <nphamcs@...il.com>,
<gregkh@...uxfoundation.org>, <linux-kernel@...r.kernel.org>,
<kvm@...r.kernel.org>, <pbonzini@...hat.com>
Subject: Re: [RFC] cputime: Introduce option to force full dynticks accounting on
NOHZ & NOHZ_IDLE CPUs
On Wed Feb 21, 2024 at 4:24 PM UTC, Sean Christopherson wrote:
> On Tue, Feb 20, 2024, Nicolas Saenz Julienne wrote:
> > Hi Sean,
> >
> > On Tue Feb 20, 2024 at 4:18 PM UTC, Sean Christopherson wrote:
> > > On Mon, Feb 19, 2024, Nicolas Saenz Julienne wrote:
> > > > Under certain extreme conditions, the tick-based cputime accounting may
> > > > produce inaccurate data. For instance, guest CPU usage is sensitive to
> > > > interrupts firing right before the tick's expiration.
>
> Ah, this confused me. The "right before" is a bit misleading. It's more like
> "shortly before", because if the interrupt that occurs due to the guest's tick
> arrives _right_ before the host tick expires, then commit 160457140187 should
> avoid horrific accounting.
>
> > > > This forces the guest into kernel context, and has that time slice
> > > > wrongly accounted as system time. This issue is exacerbated if the
> > > > interrupt source is in sync with the tick,
>
> It's worth calling out why this can happen, to make it clear that getting into
> such syncopation can happen quite naturally. E.g. something like:
>
> interrupt source is in sync with the tick, e.g. if the guest's tick
> is configured to run at the same frequency as the host tick, and the
> guest tick is every so slightly ahead of the host tick.
I'll incorporate both comments into the description. :)
> > > > significantly skewing usage metrics towards system time.
> > >
> > > ...
> > >
> > > > NOTE: This wasn't tested in depth, and it's mostly intended to highlight
> > > > the issue we're trying to solve. Also ccing KVM folks, since it's
> > > > relevant to guest CPU usage accounting.
> > >
> > > How bad is the synchronization issue on upstream kernels? We tried to address
> > > that in commit 160457140187 ("KVM: x86: Defer vtime accounting 'til after IRQ handling").
> > >
> > > I don't expect it to be foolproof, but it'd be good to know if there's a blatant
> > > flaw and/or easily closed hole.
> >
> > The issue is not really about the interrupts themselves, but their side
> > effects.
> >
> > For instance, let's say the guest sets up an Hyper-V stimer that
> > consistently fires 1 us before the preemption tick. The preemption tick
> > will expire while the vCPU thread is running with !PF_VCPU (maybe inside
> > kvm_hv_process_stimers() for ex.). As long as they both keep in sync,
> > you'll get a 100% system usage. I was able to reproduce this one through
> > kvm-unit-tests, but the race window is too small to keep the interrupts
> > in sync for long periods of time, yet still capable of producing random
> > system usage bursts (which unacceptable for some use-cases).
> >
> > Other use-cases have bigger race windows and managed to maintain high
> > system CPU usage over long periods of time. For example, with user-space
> > HPET emulation, or KVM+Xen (don't know the fine details on these, but
> > VIRT_CPU_ACCOUNTING_GEN fixes the mis-accounting). It all comes down to
> > the same situation. Something triggers an exit, and the vCPU thread goes
> > past 'vtime_account_guest_exit()' just in time for the tick interrupt to
> > show up.
>
> I suspect the common "problem" with those flows is that emulating the guest timer
> interrupt is (a) slow, relatively speaking and (b) done with interrupts enabled.
>
> E.g. on VMX, the TSC deadline timer is emulated via VMX preemption timer, and both
> the programming of the guest's TSC deadline timer and the handling of the expiration
> interrupt is done in the VM-Exit fastpath with IRQs disabled. As a result, even
> if the host tick interrupt is a hair behind the guest tick, it doesn't affect
> accounting because the host tick interrupt will never be delivered while KVM is
> emulating the guest's periodic tick.
>
> I'm guessing that if you tested on SVM (or a guest that doesn't use the APIC timer
> in deadline mode), which doesn't utilize the fastpath since KVM needs to bounce
> through hrtimers, then you'd see similar accounting problems even without using
> any of the problematic "slow" timer sources.
That's right, the "problem" will show up when periodically emulating
something with interrupts enabled. The slower the emulation the bigger
the race window. It's just a limitation of tick based accounting, I have
the feeling there isn't much KVM can do.
Nicolas
Powered by blists - more mailing lists