linux-kernel - Re: wait_for_completion_timeout() spurious failure under heavy load?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1213961403.3223.142.camel@lappy.programming.kicks-ass.net>
Date:	Fri, 20 Jun 2008 13:30:02 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Jiri Slaby <jirislaby@...il.com>,
	Roland Dreier <rdreier@...co.com>,
	linux-kernel@...r.kernel.org, Eli Cohen <eli@....mellanox.co.il>,
	general@...ts.openfabrics.org, Oleg Nesterov <oleg@...sign.ru>
Subject: Re: wait_for_completion_timeout() spurious failure under heavy
	load?

On Fri, 2008-06-20 at 13:20 +0200, Ingo Molnar wrote:
> * Jiri Slaby <jirislaby@...il.com> wrote:
> 
> >> 		do {
> >> 			timeout = schedule_timeout(timeout);
> >> 	
> >> 			if (!timeout)
> >> 				return timeout;
> >> 	
> >> 		} while (!x->done);
> >> 	
> >> 		return timeout;
> >> 	}
> >>
> >> so if the system is very busy and x->done is not set when
> >> do_wait_for_common() is entered, it is possible that the first call to
> >> schedule_timeout() returns 0 because the task doing wait_for_completion
> >
> > Sorry, but how can schedule_timeout return 0 before the timeout 
> > expiration?
> 
> the point would be that due to high load, the completion wakeup happens 
> first, but then due to scheduling delays the timeout also occurs 
> (later), before the wakeup related to x->done has managed to do its 
> task.
> 
> I.e. due to scheduling delays we report a spurious "timeout" failure, 
> despite the completion occuring before the timeout. The timeout is 
> really intended to be related to the delay of the completion event, not 
> the delay of scheduling after that event happened.
> 
> seems like a well-spotted race to me, i agree it's more robust to ignore 
> the timeout if we can make progress on x->done, and return a 1 jiffy 
> 'timeout remaining' value. Oleg, do you concur?
> 
> but i'd do it not like this:
> 
> >                       if (!timeout) {
> >                               __remove_wait_queue(&x->wait, &wait);
> > -                             return timeout;
> > +                             if (x->done) {
> > +                                     x->done--;
> > +                                     return 1;
> > +                             } else {
> > +                                     return 0;
> > +                             }
> 
> but like in the commit below. Agreed?
> 
> 	Ingo
> 
> -------------------->
> commit bb10ed0994927d433f6dbdf274fdb26cfcf516b7
> Author: Roland Dreier <rdreier@...co.com>
> Date:   Thu Jun 19 15:04:07 2008 -0700
> 
>     sched: fix wait_for_completion_timeout() spurious failure under heavy load
>     
>     It seems that the current implementaton of wait_for_completion_timeout()
>     has a small problem under very high load for the common pattern:
>     
>     	if (!wait_for_completion_timeout(&done, timeout))
>     		/* handle failure */
>     
>     because the implementation very roughly does (lots of code deleted to
>     show the basic flow):
>     
>     	static inline long __sched
>     	do_wait_for_common(struct completion *x, long timeout, int state)
>     	{
>     		if (x->done)
>     			return timeout;
>     
>     		do {
>     			timeout = schedule_timeout(timeout);
>     
>     			if (!timeout)
>     				return timeout;
>     
>     		} while (!x->done);
>     
>     		return timeout;
>     	}
>     
>     so if the system is very busy and x->done is not set when
>     do_wait_for_common() is entered, it is possible that the first call to
>     schedule_timeout() returns 0 because the task doing wait_for_completion
>     doesn't get rescheduled for a long time, even if it is woken up early
>     enough.
>     
>     In this case, wait_for_completion_timeout() returns 0 without even
>     checking x->done again, and the code above falls into its failure case
>     purely for scheduler reasons, even if the hardware event or whatever was
>     being waited for happened early enough.
>     
>     It would make sense to add an extra test to do_wait_for() in the timeout
>     case and return 1 if x->done is actually set.
>     
>     A quick audit (not exhaustive) of wait_for_completion_timeout() callers
>     seems to indicate that no one actually cares about the return value in
>     the success case -- they just test for 0 (timed out) versus non-zero
>     (wait succeeded).
>     
>     Signed-off-by: Ingo Molnar <mingo@...e.hu>

Good catch,

Acked-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>

> diff --git a/kernel/sched.c b/kernel/sched.c
> index 4a3cb06..577f160 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4405,6 +4405,16 @@ do_wait_for_common(struct completion *x, long timeout, int state)
>  			spin_unlock_irq(&x->wait.lock);
>  			timeout = schedule_timeout(timeout);
>  			spin_lock_irq(&x->wait.lock);
> +
> +			/*
> +			 * If the completion has arrived meanwhile
> +			 * then return 1 jiffy time left:
> +			 */
> +			if (x->done && !timeout) {
> +				timeout = 1;
> +				break;
> +			}
> +
>  			if (!timeout) {
>  				__remove_wait_queue(&x->wait, &wait);
>  				return timeout;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/