linux-kernel - Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160921155826.GB8408@pathway.suse.cz>
Date:   Wed, 21 Sep 2016 17:58:27 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Matt Fleming <matt@...eblueprint.co.uk>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Byungchul Park <byungchul.park@....com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Luca Abeni <luca.abeni@...tn.it>,
        Rik van Riel <riel@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Yuyang Du <yuyang.du@...el.com>, Jan Kara <jack@...e.cz>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        linux-kernel@...r.kernel.org,
        Mel Gorman <mgorman@...hsingularity.net>,
        Mike Galbraith <umgwanakikbuti@...il.com>
Subject: Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing
 update_rq_clock()

On Wed 2016-09-21 14:38:13, Matt Fleming wrote:
> There's no diagnostic checks for figuring out when we've accidentally
> missed update_rq_clock() calls. Let's add some by piggybacking on the
> rq_*pin_lock() wrappers.
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index bf48e7975c23..91f4b3d58d56 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> +/*
> + * rq::clock_update_flags bits
> + *
> + * %RQCF_REQ_SKIP - will request skipping of clock update on the next
> + *  call to __schedule(). This is an optimisation to avoid
> + *  neighbouring rq clock updates.
> + *
> + * %RQCF_ACT_SKIP - is set from inside of __schedule() when skipping is
> + *  in effect and calls to update_rq_clock() are being ignored.
> + *
> + * %RQCF_UPDATED - is a debug flag that indicates whether a call has been
> + *  made to update_rq_clock() since the last time rq::lock was pinned.
> + *
> + * If inside of __schedule(), clock_update_flags will have been
> + * shifted left (a left shift is a cheap operation for the fast path
> + * to promote %RQCF_REQ_SKIP to %RQCF_ACT_SKIP), so you must use,
> + *
> + *	if (rq-clock_update_flags >= RQCF_UPDATED)
> + *
> + * to check if %RQCF_UPADTED is set. It'll never be shifted more than
> + * one position though, because the next rq_unpin_lock() will shift it
> + * back.
> + */
> +#define RQCF_REQ_SKIP	0x01
> +#define RQCF_ACT_SKIP	0x02
> +#define RQCF_UPDATED	0x04
> +
> +static inline void assert_clock_updated(struct rq *rq)
> +{
> +#ifdef CONFIG_SCHED_DEBUG
> +	/*
> +	 * The only reason for not seeing a clock update since the
> +	 * last rq_pin_lock() is if we're currently skipping updates.
> +	 */
> +	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
> +#endif
> +}

I am afraid that it might eventually create a deadlock.
For example, there is the following call chain:

+ printk()
  + vprintk_func -> vprintk_default()
    + vprinkt_emit()
      + console_unlock()
        + up_console_sem()
	  + up()		# takes &sem->lock
	    + __up()
	      + wake_up_process()
	        + try_to_wake_up()
		  + ttwu_queue()
		    + ttwu_do_activate()
		      + ttwu_do_wakeup()
		        + rq_clock()
			  + lockdep_assert_held()
			    + WARN_ON_ONCE()
			      + printk()
			        + vprintk_func -> vprintk_default()
				  + vprintk_emit()
				    + console_try_lock()
				      + down_trylock_console_sem()
				        + __down_trylock_console_sem()
					  + down_trylock()

   DEADLOCK: Unable to take &sem->lock


We have recently discussed similar deadlock, see the thread
around https://lkml.kernel.org/r/20160714221251.GE3057@ubuntu

A temporary solution would be to replace the WARN_ON_ONCE()
by printk_deferred(). Of course, this is far from ideal because
you do not get the stack, ...

Sergey is working on WARN_ON_ONCE_DEFERRED() but it is not
an easy task.


>  static inline u64 rq_clock(struct rq *rq)
>  {
>  	lockdep_assert_held(&rq->lock);
> +	assert_clock_updated(rq);
> +
>  	return rq->clock;
>  }
>  

I am not sure how the above call chain is realistic. But adding
WARN_ON() into the scheduler paths is risky in general.

Best Regards,
Petr