[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zjubp36yHVf01C16@google.com>
Date: Wed, 8 May 2024 08:35:03 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Leonardo Bras <leobras@...hat.com>, Paolo Bonzini <pbonzini@...hat.com>, 
	Frederic Weisbecker <frederic@...nel.org>, Neeraj Upadhyay <quic_neeraju@...cinc.com>, 
	Joel Fernandes <joel@...lfernandes.org>, Josh Triplett <josh@...htriplett.org>, 
	Boqun Feng <boqun.feng@...il.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Lai Jiangshan <jiangshanlai@...il.com>, 
	Zqiang <qiang.zhang1211@...il.com>, Marcelo Tosatti <mtosatti@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, rcu@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/2] Avoid rcu_core() if CPU just left guest vcpu
On Tue, May 07, 2024, Paul E. McKenney wrote:
> On Tue, May 07, 2024 at 05:08:54PM -0700, Sean Christopherson wrote:
> > > > This is admittedly a bit indirect, but then again this is Linux-kernel
> > > > RCU that we are talking about.
> > > > 
> > > > > And I'm arguing that, since the @user check isn't bombproof, there's no reason to
> > > > > try to harden against every possible edge case in an equivalent @guest check,
> > > > > because it's unnecessary for kernel safety, thanks to the guardrails.
> > > > 
> > > > And the same argument above would also apply to an equivalent check for
> > > > execution in guest mode at the time of the interrupt.
> > > 
> > > This is partly why I was off in the weeds.  KVM cannot guarantee that the
> > > interrupt that leads to rcu_pending() actually interrupted the guest.  And the
> > > original patch didn't help at all, because a time-based check doesn't come
> > > remotely close to the guarantees that the @user check provides.
> 
> Nothing in the registers from the interrupted context permits that
> determination?
No, because the interrupt/call chain that reaches rcu_pending() actually originates
in KVM host code, not guest code.  I.e. the eventual IRET will return control to
KVM, not to the guest.
On AMD, the interrupt quite literally interrupts the host, not the guest.  AMD
CPUs don't actually acknowledge/consume the physical interrupt when the guest is
running, the CPU simply generates a VM-Exit that says "there's an interrupt pending".
It's up to software, i.e. KVM, to enable IRQs and handle (all!) pending interrupts.
Intel CPUs have a mode where the CPU fully acknowledges the interrupt and reports
the exact vector that caused the VM-Exit, but it's still up to software to invoke
the interrupt handler, i.e. the interrupt trampolines through KVM.
And before handling/forwarding the interrupt, KVM exits its quiescent state,
leaves its no-instrumention region, invokes tracepoitnes, etc.  So even my PF_VCPU
idea is _very_ different than the user/idle scenarios, where the interrupt really
truly does original from an extended quiescent state.
> > > > But if we do need RCU to be more aggressive about treating guest execution as
> > > > an RCU quiescent state within the host, that additional check would be an
> > > > excellent way of making that happen.
> > > 
> > > It's not clear to me that being more agressive is warranted.  If my understanding
> > > of the existing @user check is correct, we _could_ achieve similar functionality
> > > for vCPU tasks by defining a rule that KVM must never enter an RCU critical section
> > > with PF_VCPU set and IRQs enabled, and then rcu_pending() could check PF_VCPU.
> > > On x86, this would be relatively straightforward (hack-a-patch below), but I've
> > > no idea what it would look like on other architectures.
> 
> At first glance, this looks plausible.  I would guess that a real patch
> would have to be architecture dependent, and that could simply involve
> a Kconfig option (perhaps something like CONFIG_RCU_SENSE_GUEST), so
> that the check you add to rcu_pending is conditioned on something like
> IS_ENABLED(CONFIG_RCU_SENSE_GUEST).
> 
> There would also need to be a similar check in rcu_sched_clock_irq(),
> or maybe in rcu_flavor_sched_clock_irq(), to force a call to rcu_qs()
> in this situation.
> 
> > > But the value added isn't entirely clear to me, probably because I'm still missing
> > > something.  KVM will have *very* recently called __ct_user_exit(CONTEXT_GUEST) to
> > > note the transition from guest to host kernel.  Why isn't that a sufficient hook
> > > for RCU to infer grace period completion?
> 
> Agreed, unless we are sure we need the change, we should not make it.
+1.  And your comments about tracepoints, instrumentions, etc. makes me think
that trying to force the issue with PF_VCPU would be a bad idea.
Powered by blists - more mailing lists
 
