linux-kernel - Re: [ANNOUNCE] sched: schedtop utility

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 19 Jun 2008 12:27:14 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Ankita Garg <ankita@...ibm.com>
Cc:	Gregory Haskins <ghaskins@...ell.com>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	rostedt@...dmis.org, suresh.b.siddha@...el.com,
	aneesh.kumar@...ux.vnet.ibm.com, dhaval@...ux.vnet.ibm.com,
	vatsa@...ux.vnet.ibm.com, David Bahi <DBahi@...ell.com>
Subject: Re: [ANNOUNCE] sched: schedtop utility

On Thu, 2008-06-05 at 10:50 +0530, Ankita Garg wrote:

> Thanks Peter for the explanation...
>
> I agree with the above and that is the reason why I did not see weird
> values with cpu_time. But, run_delay still would suffer skews as the end
> points for delta could be taken on different cpus due to migration (more
> so on RT kernel due to the push-pull operations). With the below patch,
> I could not reproduce the issue I had seen earlier. After every dequeue,
> we take the delta and start wait measurements from zero when moved to a 
> different rq.

OK, so task delay delay accounting is broken because it doesn't take
migration into account.

What you've done is make it symmetric wrt enqueue, and account it like

  cpu0      cpu1

enqueue
 <wait-d1>
dequeue
            enqueue
             <wait-d2>
            run

Where you add both d1 and d2 to the run_delay,.. right?

This seems like a good fix, however it looks like the patch will break
compilation in !CONFIG_SCHEDSTATS && !CONFIG_TASK_DELAY_ACCT, of it
failing to provide a stub for sched_info_dequeue() in that case.


> Signed-off-by: Ankita Garg <ankita@...ibm.com> 
> 
> Index: linux-2.6.24.4/kernel/sched.c
> ===================================================================
> --- linux-2.6.24.4.orig/kernel/sched.c	2008-06-03 14:14:07.000000000 +0530
> +++ linux-2.6.24.4/kernel/sched.c	2008-06-04 12:48:34.000000000 +0530
> @@ -948,6 +948,7 @@
>  
>  static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
>  {
> +	sched_info_dequeued(p);
>  	p->sched_class->dequeue_task(rq, p, sleep);
>  	p->se.on_rq = 0;
>  }
> Index: linux-2.6.24.4/kernel/sched_stats.h
> ===================================================================
> --- linux-2.6.24.4.orig/kernel/sched_stats.h	2008-06-03 14:14:28.000000000 +0530
> +++ linux-2.6.24.4/kernel/sched_stats.h	2008-06-05 10:39:39.000000000 +0530
> @@ -113,6 +113,13 @@
>  	if (rq)
>  		rq->rq_sched_info.cpu_time += delta;
>  }
> +
> +static inline void
> +rq_sched_info_dequeued(struct rq *rq, unsigned long long delta)
> +{
> +	if (rq)
> +		rq->rq_sched_info.run_delay += delta;
> +}
>  # define schedstat_inc(rq, field)	do { (rq)->field++; } while (0)
>  # define schedstat_add(rq, field, amt)	do { (rq)->field += (amt); } while (0)
>  # define schedstat_set(var, val)	do { var = (val); } while (0)
> @@ -129,6 +136,11 @@
>  #endif
>  
>  #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> +static inline void sched_info_reset_dequeued(struct task_struct *t)
> +{
> +	t->sched_info.last_queued = 0;
> +}
> +
>  /*
>   * Called when a process is dequeued from the active array and given
>   * the cpu.  We should note that with the exception of interactive
> @@ -138,15 +150,22 @@
>   * active queue, thus delaying tasks in the expired queue from running;
>   * see scheduler_tick()).
>   *
> - * This function is only called from sched_info_arrive(), rather than
> - * dequeue_task(). Even though a task may be queued and dequeued multiple
> - * times as it is shuffled about, we're really interested in knowing how
> - * long it was from the *first* time it was queued to the time that it
> - * finally hit a cpu.
> + * Though we are interested in knowing how long it was from the *first* time a
> + * task was queued to the time that it finally hit a cpu, we call this routine
> + * from dequeue_task() to account for possible rq->clock skew across cpus. The
> + * delta taken on each cpu would annul the skew.
>   */
>  static inline void sched_info_dequeued(struct task_struct *t)
>  {
> -	t->sched_info.last_queued = 0;
> +	unsigned long long now = task_rq(t)->clock, delta = 0;
> +
> +	if(unlikely(sched_info_on()))
> +		if(t->sched_info.last_queued)
> +				delta = now - t->sched_info.last_queued;
> +	sched_info_reset_dequeued(t);
> +	t->sched_info.run_delay += delta;
> +
> +	rq_sched_info_dequeued(task_rq(t), delta);
>  }
>  
>  /*
> @@ -160,7 +179,7 @@
>  
>  	if (t->sched_info.last_queued)
>  		delta = now - t->sched_info.last_queued;
> -	sched_info_dequeued(t);
> +	sched_info_reset_dequeued(t);
>  	t->sched_info.run_delay += delta;
>  	t->sched_info.last_arrival = now;
>  	t->sched_info.pcount++;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/