lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200721142922.qyd4o44rqvy7kbxu@wittgenstein>
Date:   Tue, 21 Jul 2020 16:29:22 +0200
From:   Christian Brauner <christian.brauner@...ntu.com>
To:     peterz@...radead.org
Cc:     Paul Gortmaker <paul.gortmaker@...driver.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Jiri Slaby <jirislaby@...nel.org>, christian@...uner.io,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Linux kernel mailing list <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...e.de>,
        Dave Jones <davej@...emonkey.org.uk>
Subject: Re: [PATCH] sched: Fix race against ptrace_freeze_trace()

On Tue, Jul 21, 2020 at 02:13:08PM +0200, peterz@...radead.org wrote:
> 
> There is apparently one site that violates the rule that only current
> and ttwu() will modify task->state, namely ptrace_{,un}freeze_traced()
> will change task->state for a remote task.
> 
> Oleg explains:
> 
>   "TASK_TRACED/TASK_STOPPED was always protected by siglock. In
> particular, ttwu(__TASK_TRACED) must be always called with siglock
> held. That is why ptrace_freeze_traced() assumes it can safely do
> s/TASK_TRACED/__TASK_TRACED/ under spin_lock(siglock)."
> 
> This breaks the ordering scheme introduced by commit:
> 
>   dbfb089d360b ("sched: Fix loadavg accounting race")
> 
> Specifically, the reload not matching no longer implies we don't have
> to block.
> 
> Simply things by noting that what we need is a LOAD->STORE ordering
> and this can be provided by a control dependency.
> 
> So replace:
> 
> 	prev_state = prev->state;
> 	raw_spin_lock(&rq->lock);
> 	smp_mb__after_spinlock(); /* SMP-MB */
> 	if (... && prev_state && prev_state == prev->state)
> 		deactivate_task();
> 
> with:
> 
> 	prev_state = prev->state;
> 	if (... && prev_state) /* CTRL-DEP */
> 		deactivate_task();
> 
> Since that already implies the 'prev->state' load must be complete
> before allowing the 'prev->on_rq = 0' store to become visible.
> 
> Fixes: dbfb089d360b ("sched: Fix loadavg accounting race")
> Reported-by: Jiri Slaby <jirislaby@...nel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Tested-by: Paul Gortmaker <paul.gortmaker@...driver.com>
> ---

Thank you. I applied this on top of v5.8-rc6 and re-ran the strace-test
suite successfully. So at least

Tested-by: Christian Brauner <christian.brauner@...ntu.com>

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4193,9 +4193,6 @@ static void __sched notrace __schedule(b
>  	local_irq_disable();
>  	rcu_note_context_switch(preempt);
>  
> -	/* See deactivate_task() below. */
> -	prev_state = prev->state;
> -
>  	/*
>  	 * Make sure that signal_pending_state()->signal_pending() below
>  	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> @@ -4219,11 +4216,16 @@ static void __sched notrace __schedule(b
>  	update_rq_clock(rq);
>  
>  	switch_count = &prev->nivcsw;
> +
>  	/*
> -	 * We must re-load prev->state in case ttwu_remote() changed it
> -	 * before we acquired rq->lock.
> +	 * We must load prev->state once (task_struct::state is volatile), such
> +	 * that:
> +	 *
> +	 *  - we form a control dependency vs deactivate_task() below.
> +	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
>  	 */
> -	if (!preempt && prev_state && prev_state == prev->state) {
> +	prev_state = prev->state;
> +	if (!preempt && prev_state) {
>  		if (signal_pending_state(prev_state, prev)) {
>  			prev->state = TASK_RUNNING;
>  		} else {
> @@ -4237,10 +4239,12 @@ static void __sched notrace __schedule(b
>  
>  			/*
>  			 * __schedule()			ttwu()
> -			 *   prev_state = prev->state;	  if (READ_ONCE(p->on_rq) && ...)
> -			 *   LOCK rq->lock		    goto out;
> -			 *   smp_mb__after_spinlock();	  smp_acquire__after_ctrl_dep();
> -			 *   p->on_rq = 0;		  p->state = TASK_WAKING;
> +			 *   if (prev_state)		  if (p->on_rq && ...)
> +			 *     p->on_rq = 0;		    goto out;
> +			 *				  smp_acquire__after_ctrl_dep();
> +			 *				  p->state = TASK_WAKING
> +			 *
> +			 * Where __schedule() and ttwu() have matching control dependencies.
>  			 *
>  			 * After this, schedule() must not care about p->state any more.
>  			 */

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ