lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 15 Jul 2021 06:32:41 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Sergey Senozhatsky <senozhatsky@...omium.org>
Cc:     Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2 1/2] rcu/tree: handle VM stoppage in stall detection

On Thu, Jul 15, 2021 at 06:09:45PM +0900, Sergey Senozhatsky wrote:
> On (21/05/22 00:56), Sergey Senozhatsky wrote:
> > Soft watchdog timer function checks if a virtual machine
> > was suspended and hence what looks like a lockup in fact
> > is a false positive.
> > 
> > This is what kvm_check_and_clear_guest_paused() does: it
> > tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> > and if it's set then we need to touch all watchdogs and bail
> > out.
> > 
> > Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> > check works fine.
> > 
> > There is, however, one more watchdog that runs from IRQ, so
> > watchdog timer fn races with it, and that watchdog is not aware
> > of PVCLOCK_GUEST_STOPPED - RCU stall detector.
> > 
> > apic_timer_interrupt()
> >  smp_apic_timer_interrupt()
> >   hrtimer_interrupt()
> >    __hrtimer_run_queues()
> >     tick_sched_timer()
> >      tick_sched_handle()
> >       update_process_times()
> >        rcu_sched_clock_irq()
> > 
> > This triggers RCU stalls on our devices during VM resume.
> > 
> > If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> > before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> > then there is nothing on this VCPU that touches watchdogs and
> > RCU reads stale gp stall timestamp and new jiffies value, which
> > makes it think that RCU has stalled.
> > 
> > Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> > don't report RCU stalls when we resume the VM.
> 
> Hello Paul,
> 
> I've noticed that this patch set didn't make it to Linus's tree.
> Was it intentional?

This patch (and the 18 preceding it) didn't make the cutoff for the
just-past merge window.  If this patch is urgent, please let me know
and I can push it, with luck by the end of next week.

If that one is urgent, are these two also?

817690fd18af ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()")
9ed9bf0d17cd ("rcu: Start timing stall repetitions after warning complete")

If so, it is better to handle them as a group than separately.

The cutoff for a given merge window is normally shortly after the close
of the previous merge window.  This time, I am a bit slow creating
branches, but the cutoff for the v5.15 merge window should be by the
end of the week.  This is a bit more lag than most subsystems, but
this is after all RCU.

As always, if a given commit is urgent, please let me know and I
will see what I can do to fast-track it.

For reference:

https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html

Again, if this one needs to hit mainline before the v5.15 merge
window, please let me know.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ