lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZhRoDfoz-YqsGhIB@google.com>
Date: Mon, 8 Apr 2024 14:56:29 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Marcelo Tosatti <mtosatti@...hat.com>, Leonardo Bras <leobras@...hat.com>, 
	Paolo Bonzini <pbonzini@...hat.com>, Frederic Weisbecker <frederic@...nel.org>, 
	Neeraj Upadhyay <quic_neeraju@...cinc.com>, Joel Fernandes <joel@...lfernandes.org>, 
	Josh Triplett <josh@...htriplett.org>, Boqun Feng <boqun.feng@...il.com>, 
	Steven Rostedt <rostedt@...dmis.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, 
	Lai Jiangshan <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, rcu@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/2] Avoid rcu_core() if CPU just left guest vcpu

On Mon, Apr 08, 2024, Paul E. McKenney wrote:
> On Mon, Apr 08, 2024 at 01:06:00PM -0700, Sean Christopherson wrote:
> > On Mon, Apr 08, 2024, Paul E. McKenney wrote:
> > > > > > +	if (vcpu->wants_to_run)
> > > > > > +		context_tracking_guest_start_run_loop();
> > > > > 
> > > > > At this point, if this is a nohz_full CPU, it will no longer report
> > > > > quiescent states until the grace period is at least one second old.
> > > > 
> > > > I don't think I follow the "will no longer report quiescent states" issue.  Are
> > > > you saying that this would prevent guest_context_enter_irqoff() from reporting
> > > > that the CPU is entering a quiescent state?  If so, that's an issue that would
> > > > need to be resolved regardless of what heuristic we use to determine whether or
> > > > not a CPU is likely to enter a KVM guest.
> > > 
> > > Please allow me to start over.  Are interrupts disabled at this point,
> > 
> > Nope, IRQs are enabled.
> > 
> > Oof, I'm glad you asked, because I was going to say that there's one exception,
> > kvm_sched_in(), which is KVM's notifier for when a preempted task/vCPU is scheduled
> > back in.  But I forgot that kvm_sched_{in,out}() don't use vcpu_{load,put}(),
> > i.e. would need explicit calls to context_tracking_guest_{stop,start}_run_loop().
> > 
> > > and, if so, will they remain disabled until the transfer of control to
> > > the guest has become visible to RCU via the context-tracking code?
> > > 
> > > Or has the context-tracking code already made the transfer of control
> > > to the guest visible to RCU?
> > 
> > Nope.  The call to __ct_user_enter(CONTEXT_GUEST) or rcu_virt_note_context_switch()
> > happens later, just before the actual VM-Enter.  And that call does happen with
> > IRQs disabled (and IRQs stay disabled until the CPU enters the guest).
> 
> OK, then we can have difficulties with long-running interrupts hitting
> this range of code.  It is unfortunately not unheard-of for interrupts
> plus trailing softirqs to run for tens of seconds, even minutes.

Ah, and if that occurs, *and* KVM is slow to re-enter the guest, then there will
be a massive lag before the CPU gets back into a quiescent state.

> One counter-argument is that that softirq would take scheduling-clock
> interrupts, and would eventually make rcu_core() run.

Considering that this behavior would be unique to nohz_full CPUs, how much
responsibility does RCU have to ensure a sane setup?  E.g. if a softirq runs for
multiple seconds on a nohz_full CPU whose primary role is to run a KVM vCPU, then
whatever real-time workaround the vCPU is running is already doomed.

> But does a rcu_sched_clock_irq() from a guest OS have its "user"
> argument set?

No, and it shouldn't, at least not on x86 (I assume other architectures are
similar, but I don't actually no for sure).

On x86, the IRQ that the kernel sees comes looks like it comes from host kernel
code.  And on AMD (SVM), the IRQ doesn't just "look" like it came from host kernel,
the IRQ really does get vectored/handled in the host kernel.  Intel CPUs have a
performance optimization where the IRQ gets "eaten" as part of the VM-Exit, and
so KVM synthesizes a stack frame and does a manual CALL to invoke the IRQ handler.

And that's just for IRQs that actually arrive while the guest is running.  IRQs
arrive while KVM is active, e.g. running its large vcpu_run(), are "pure" host
IRQs.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ