[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1998186.Se6Zm9i2s4@aspire.rjw.lan>
Date:   Sun, 11 Mar 2018 11:31:31 +0100
From:   "Rafael J. Wysocki" <rjw@...ysocki.net>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        Thomas Ilsche <thomas.ilsche@...dresden.de>,
        Doug Smythies <dsmythies@...us.net>,
        Rik van Riel <riel@...riel.com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Mike Galbraith <mgalbraith@...e.de>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC/RFT][PATCH v3 5/6] sched: idle: Select idle state before stopping the tick
On Sunday, March 11, 2018 2:44:37 AM CET Frederic Weisbecker wrote:
> On Fri, Mar 09, 2018 at 10:46:55AM +0100, Rafael J. Wysocki wrote:
> > --- linux-pm.orig/kernel/time/tick-sched.h
> > +++ linux-pm/kernel/time/tick-sched.h
> > @@ -30,6 +30,7 @@ enum tick_nohz_mode {
> >   *			when the CPU returns from nohz sleep.
> >   * @next_tick:		Next tick to be fired when in dynticks mode.
> >   * @tick_stopped:	Indicator that the idle tick has been stopped
> > + * @tick_may_stop:	Indicator that the idle tick may be stopped shortly
> 
> Perhaps we can set timer_expires to 0 instead when we want to invalidate
> the last value?
timer_expires can be 0 for other reasons.
I can use timer_expires_basemono for that (I actually did that in one version
of my patches, but it looked odd and I decided to use a new field for clarity)
if you prefer to avoid adding an extra field.
> >   * @idle_jiffies:	jiffies at the entry to idle for idle time accounting
> >   * @idle_calls:		Total number of idle calls
> >   * @idle_sleeps:	Number of idle calls, where the sched tick was stopped
> > @@ -38,7 +39,6 @@ enum tick_nohz_mode {
> >   * @idle_exittime:	Time when the idle state was left
> >   * @idle_sleeptime:	Sum of the time slept in idle with sched tick stopped
> >   * @iowait_sleeptime:	Sum of the time slept in idle with sched tick stopped, with IO outstanding
> > - * @sleep_length:	Duration of the current idle sleep
> >   * @do_timer_lst:	CPU was the last one doing do_timer before going idle
> >   */
> >  struct tick_sched {
> > @@ -49,6 +49,7 @@ struct tick_sched {
> >  	ktime_t				next_tick;
> >  	int				inidle;
> >  	int				tick_stopped;
> > +	int				tick_may_stop;
> >  	unsigned long			idle_jiffies;
> >  	unsigned long			idle_calls;
> >  	unsigned long			idle_sleeps;
> > @@ -58,8 +59,9 @@ struct tick_sched {
> >  	ktime_t				idle_exittime;
> >  	ktime_t				idle_sleeptime;
> >  	ktime_t				iowait_sleeptime;
> > -	ktime_t				sleep_length;
> >  	unsigned long			last_jiffies;
> > +	u64				timer_expires;
> > +	u64				timer_expires_basemono;
> 
> We may need documentation for the above fields too.
OK
> > Index: linux-pm/kernel/time/tick-sched.c
> > ===================================================================
> > --- linux-pm.orig/kernel/time/tick-sched.c
> > +++ linux-pm/kernel/time/tick-sched.c
> > @@ -652,13 +652,10 @@ static inline bool local_timer_softirq_p
> >  	return local_softirq_pending() & TIMER_SOFTIRQ;
> >  }
> >  
> > -static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > -					 ktime_t now, int cpu)
> > +static ktime_t __tick_nohz_next_event(struct tick_sched *ts, int cpu)
> 
> Since we don't seem to have a lower level version, can we remove the underscores?
Sure, will do.
> >  {
> > -	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> >  	u64 basemono, next_tick, next_tmr, next_rcu, delta, expires;
> >  	unsigned long seq, basejiff;
> > -	ktime_t	tick;
> >  
> >  	/* Read jiffies and the time when jiffies were updated last */
> >  	do {
> [...]
> 
> 
> > +static void __tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
> 
> (Same comment here about the underscores).
Yup.
> > +{
> > +	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> > +	u64 basemono = ts->timer_expires_basemono;
> > +	u64 expires = ts->timer_expires;
> > +	ktime_t tick = expires;
> > +
> > +	/* Make sure we won't be trying to stop it twice in a row. */
> > +	ts->tick_may_stop = 0;
> > +
> > +	/*
> > +	 * If this CPU is the one which updates jiffies, then give up
> > +	 * the assignment and let it be taken by the CPU which runs
> > +	 * the tick timer next, which might be this CPU as well. If we
> > +	 * don't drop this here the jiffies might be stale and
> > +	 * do_timer() never invoked. Keep track of the fact that it
> > +	 * was the one which had the do_timer() duty last.
> > +	 */
> > +	if (cpu == tick_do_timer_cpu) {
> > +		tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> > +		ts->do_timer_last = 1;
> > +	}
> >  
> >  	/* Skip reprogram of event if its not changed */
> >  	if (ts->tick_stopped && (expires == ts->next_tick)) {
> >  		/* Sanity check: make sure clockevent is actually programmed */
> >  		if (tick == KTIME_MAX || ts->next_tick == hrtimer_get_expires(&ts->sched_timer))
> > -			goto out;
> > +			return;
> >  
> >  		WARN_ON_ONCE(1);
> >  		printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu timer->active: %d timer->expires: %llu\n",
> [...]
> > +void tick_nohz_idle_retain_tick(void)
> > +{
> > +	__this_cpu_write(tick_cpu_sched.tick_may_stop, 0);
> > +}
> > +
> 
> So, I've become overly-paranoid about cached expiration on nohz code, we've run
> into bugs that took months to debug before. It seems that the cached version shouldn't
> leak in any way there, still can we have checks such as this in tick_nohz_idle_enter/exit()?
> 
>      WARN_ON_ONCE(__this_cpu_read(tick_cpu_sched.tick_may_stop));
OK
> Otherwise a leaking cached expiration may mislead further nohz tick stop and
> bypass calls to tick_nohz_next_event().
> 
> Also let's make sure we never call tick_nohz_get_sleep_length() outside idle:
> 
>      WARN_ON_ONCE(!__this_cpu_read(tick_cpu_sched.inidle));
OK
I'll respin the series with the above comments addressed early next week.
Thanks!
Powered by blists - more mailing lists
 
