lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 7 Apr 2015 19:29:23 -0300
From:	Marcelo Tosatti <mtosatti@...hat.com>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	linux-kernel@...r.kernel.org, Rik van Riel <riel@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Luiz Capitulino <lcapitulino@...hat.com>
Subject: Re: kernel/timer: avoid spurious ksoftirqd wakeups

On Tue, Apr 07, 2015 at 10:17:23PM +0200, Frederic Weisbecker wrote:
> On Mon, Apr 06, 2015 at 08:51:26PM -0300, Marcelo Tosatti wrote:
> > On Tue, Apr 07, 2015 at 01:34:15AM +0200, Frederic Weisbecker wrote:
> > > Yeah, it would be nice to make sure that the cause of these softirqs isn't
> > > mistakenly ignored. 
> > > And also I want to be sure we really understand what we
> > > are doing, which is not the case right now as we don't know what is causing
> > > this expired timer.
> > 
> > What is the interrupt that is the cause for tick_nohz_stop_sched_tick,
> > you mean? 
> > 
> >        <...>-45815 [015] d...2.. 25722056692012 (+961446): kvm_exit: reason EXTERNAL_INTERRUPT rip 0x7f5e448479d0 info 0 800000ef
> >        <...>-45815 [015] d..h1.. 25722056692844 (+832): apic_timer_fn<-__run_hrtimer
> >        <...>-45815 [015] d...1.. 25722056695442 (+2598): raise_softirq_irqoff <-tick_nohz_stop_sched_tick
> > 
> > Emulation of guest APIC timer by hrtimer (apic_timer_fn).
> 
> Nope, I meant what is the root cause of the softirq.
> But lets continue on that below:
> 
> > > Sure, but why is it waking up exactly?
> > 
> > Because there is a bug (the patch is trying to fix the bug by 
> > raising timer softirq only when timer softirq handler has any 
> > work to do).
> > 
> > The only timers pending in the timer list are deferred ones
> > from vmstat_update:
> > 
> > ksoftirqd/15-265   [015] ....111 25722056709372 (+7098): softirq_entry: vec=1 [action=TIMER]
> > ksoftirqd/15-265   [015] .....11 25722056709964 (+592): run_timer_softirq <-do_current_softirqs
> > ksoftirqd/15-265   [015] ....111 25722056714034 (+4070): timer_expire_entry: timer=ffff88082f6f14a0 function=delayed_work_timer_fn now=4480299175
> > ksoftirqd/15-265   [015] ....112 25722056715738 (+1704):
> > workqueue_queue_work: work struct=ffff88082f6f1480 function=vmstat_update workqueue=ffff88041f408000 req_cpu=5120 cpu=15
> > ksoftirqd/15-265   [015] ....112 25722056716304 (+566): workqueue_activate_work: work struct ffff88082f6f1480
> > ksoftirqd/15-265   [015] ....111 25722056719052 (+2748): timer_expire_exit: timer=ffff88082f6f14a0
> > ksoftirqd/15-265   [015] ....111 25722056719384 (+332): softirq_exit: vec=1 [action=TIMER]
> > 
> > Which should only be processed once there are actual add_timer timers
> > being fired (there are no such add_timer timers on this CPU).
> > 
> > Does that make sense?
> 
> So the source of these softirqs is those deffered timers? But defferable timers
> are only defferable in idle-nohz mode, not full-nohz. They are actually deffered
> in practice in full-nohz but it's a bug :o)  (which I need to fix).
> 
> Still, I don't think this is the source of the softirqs since your patch fixes
> the issue of non-timers triggering softirqs.
> 
> So here is the issue: something that is not a "struct timer_list" is causing the
> expiry time of the next tick to be in the past or now. See tick_nohz_stop_sched_tick(),
> the softirq is triggered when delta_jiffies < 1 

delta_jiffies = NEXT_TIMER_MAX_DELTA.

tick_nohz_stop_sched_tick:     delta_jiffies: 1073741823 rcu_delta_jiffies: 18446744073709551615 tick_stopped: 1

> or when the timer fails to be reprogrammed
> because it has already expired.

Right, missed that. I'll ask Luiz to gather info on why its 
failing.

> 
> What can cause this expiry time to be now or in the past? Well for that we need to
> check everything that is used to evaluate the next tick:
> 
> 1) struct timer_list Timers
> 2) low-res hrtimers
> 3) scheduler_tick_max_deferment
> 4) timekeeping_max_deferment
> 5) (rcu|arch|irq_work)_needs_tick()
> 6) maybe something else I'm missing
> 
> Your patch has reduced the softirq to only be triggered in case 1) and it works
> for you. This means the spurious softirqs that you saw were caused by 2,3,4,5 or 6.
> I want to know which one and why because I need to understand exactly which event
> is going to not trigger a softirq anymore after this patch. We want know that to 
> ensure there is no side effect after your patch.
> 
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists