lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 24 Aug 2011 05:43:30 +0200
From:	Mike Galbraith <efault@....de>
To:	seth bollinger <seth.boll@...il.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Possible scheduler bug

On Tue, 2011-08-23 at 20:58 -0500, seth bollinger wrote:
> Hello All,
> 
> We recently ran into an interesting scheduler problem when testing one
> of our products. It manifested itself as a user space lockup.  When I
> enabled/printed scheduler stats I noticed that the scheduler was
> always picking the same task to run, and no task stats were being
> updated(clock, sum_exec, sum_sleep, etc.). The scheduler would become
> stuck in this state permanently. This problem was ultimately resolved
> by the following patch to sched.c
> 
> @@ -564,7 +569,7 @@ void check_preempt_curr(struct rq *rq, struct
> task_struct *p, int flags)
>          * A queue event has occurred, and we're going to schedule.  In
>          * this case, we can save a useless back to back clock update.
>          */
> -       if (test_tsk_need_resched(p))
> +       if (rq->curr->se.on_rq && test_tsk_need_resched(rq->curr))
>                 rq->skip_clock_update = 1;
>  }

Yeah, that's correct, but see f26f9aff6aaf67e9a430d16c266f91b13a5bff64.
You'll also want the other bits as well.  (but not the WARN_ON()) 

> I have two questions regarding this patch.
> 
> 1. How was it possible to get the scheduler locked up like that (prior
> to patch application)?

If the clock isn't updated, vruntimes don't advance, so you could end up
selecting the same task repeatedly.

> 2. After patch, is it possible that the scheduler could spin in this
> loop until a sched_clock() tick (our clock resolution is unfortunately
> 10ms)?

If you take the rest of the fix, that shouldn't happen.

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ