[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <edc8b1ad-dee0-456f-89fb-47bd4709ff0e@paulmck-laptop>
Date: Mon, 8 Apr 2024 15:35:12 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Marcelo Tosatti <mtosatti@...hat.com>,
Leonardo Bras <leobras@...hat.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <quic_neeraju@...cinc.com>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>,
Boqun Feng <boqun.feng@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang1211@...il.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, rcu@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/2] Avoid rcu_core() if CPU just left guest vcpu
On Mon, Apr 08, 2024 at 02:56:29PM -0700, Sean Christopherson wrote:
> On Mon, Apr 08, 2024, Paul E. McKenney wrote:
> > On Mon, Apr 08, 2024 at 01:06:00PM -0700, Sean Christopherson wrote:
> > > On Mon, Apr 08, 2024, Paul E. McKenney wrote:
> > > > > > > + if (vcpu->wants_to_run)
> > > > > > > + context_tracking_guest_start_run_loop();
> > > > > >
> > > > > > At this point, if this is a nohz_full CPU, it will no longer report
> > > > > > quiescent states until the grace period is at least one second old.
> > > > >
> > > > > I don't think I follow the "will no longer report quiescent states" issue. Are
> > > > > you saying that this would prevent guest_context_enter_irqoff() from reporting
> > > > > that the CPU is entering a quiescent state? If so, that's an issue that would
> > > > > need to be resolved regardless of what heuristic we use to determine whether or
> > > > > not a CPU is likely to enter a KVM guest.
> > > >
> > > > Please allow me to start over. Are interrupts disabled at this point,
> > >
> > > Nope, IRQs are enabled.
> > >
> > > Oof, I'm glad you asked, because I was going to say that there's one exception,
> > > kvm_sched_in(), which is KVM's notifier for when a preempted task/vCPU is scheduled
> > > back in. But I forgot that kvm_sched_{in,out}() don't use vcpu_{load,put}(),
> > > i.e. would need explicit calls to context_tracking_guest_{stop,start}_run_loop().
> > >
> > > > and, if so, will they remain disabled until the transfer of control to
> > > > the guest has become visible to RCU via the context-tracking code?
> > > >
> > > > Or has the context-tracking code already made the transfer of control
> > > > to the guest visible to RCU?
> > >
> > > Nope. The call to __ct_user_enter(CONTEXT_GUEST) or rcu_virt_note_context_switch()
> > > happens later, just before the actual VM-Enter. And that call does happen with
> > > IRQs disabled (and IRQs stay disabled until the CPU enters the guest).
> >
> > OK, then we can have difficulties with long-running interrupts hitting
> > this range of code. It is unfortunately not unheard-of for interrupts
> > plus trailing softirqs to run for tens of seconds, even minutes.
>
> Ah, and if that occurs, *and* KVM is slow to re-enter the guest, then there will
> be a massive lag before the CPU gets back into a quiescent state.
Exactly!
> > One counter-argument is that that softirq would take scheduling-clock
> > interrupts, and would eventually make rcu_core() run.
>
> Considering that this behavior would be unique to nohz_full CPUs, how much
> responsibility does RCU have to ensure a sane setup? E.g. if a softirq runs for
> multiple seconds on a nohz_full CPU whose primary role is to run a KVM vCPU, then
> whatever real-time workaround the vCPU is running is already doomed.
True, but it is always good to be doing one's part.
> > But does a rcu_sched_clock_irq() from a guest OS have its "user"
> > argument set?
>
> No, and it shouldn't, at least not on x86 (I assume other architectures are
> similar, but I don't actually no for sure).
>
> On x86, the IRQ that the kernel sees comes looks like it comes from host kernel
> code. And on AMD (SVM), the IRQ doesn't just "look" like it came from host kernel,
> the IRQ really does get vectored/handled in the host kernel. Intel CPUs have a
> performance optimization where the IRQ gets "eaten" as part of the VM-Exit, and
> so KVM synthesizes a stack frame and does a manual CALL to invoke the IRQ handler.
>
> And that's just for IRQs that actually arrive while the guest is running. IRQs
> arrive while KVM is active, e.g. running its large vcpu_run(), are "pure" host
> IRQs.
OK, then is it possible to get some other indication to the
rcu_sched_clock_irq() function that it has interrupted a guest OS?
Not an emergency, and maybe not even necessary, but it might well be
one hole that would be good to stop up.
Thanx, Paul
Powered by blists - more mailing lists