linux-kernel - Re: [PATCH] trace: reset sleep/block start time on task switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1327352631.2446.22.camel@twins>
Date:	Mon, 23 Jan 2012 22:03:51 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Arun Sharma <asharma@...com>
Cc:	linux-kernel@...r.kernel.org,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Andrew Vagin <avagin@...nvz.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH] trace: reset sleep/block start time on task switch

On Mon, 2012-01-23 at 10:41 -0800, Arun Sharma wrote:
> On 1/23/12 3:34 AM, Peter Zijlstra wrote:

> > This'll fail to compile for !CONFIG_SCHEDSTAT I guess.. I should have
> > paid more attention to the initial patch, that tracepoint having
> > side-effects is a big no-no.
> >
> > Having unconditional writes there is somewhat sad, but I suspect putting
> > a conditional around it isn't going to help much..
> 
> For performance reasons?

Yep.

> 
> > bah can we
> > restructure things so we don't need this?
> >
> 
> We can go back to the old code, where these values were getting reset in 
> {en,de}queue_sleeper(). But we'll have to do it conditionally, so the 
> values are preserved till context switch time when we need it there.
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1003,6 +1003,8 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, 
> struct sched_entity *se)
>                  if (unlikely(delta > se->statistics.sleep_max))
>                          se->statistics.sleep_max = delta;
> 
> +               if (!trace_sched_stat_sleeptime_enabled())
> +                       se->statistics.sleep_start = 0;
>                  se->statistics.sum_sleep_runtime += delta;
> 
>                  if (tsk) {
> @@ -1019,6 +1021,8 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, 
> struct sched_entity *se)
>                  if (unlikely(delta > se->statistics.block_max))
>                          se->statistics.block_max = delta;
> 
> +               if (!trace_sched_stat_sleeptime_enabled())
> +                       se->statistics.block_start = 0;
>                  se->statistics.sum_sleep_runtime += delta;
> 
>                  if (tsk) {
> 
> This looks pretty ugly too,

Agreed, it still violates the tracepoints shouldn't actually affect the
code principle.

>  I don't know how to check for a tracepoint 
> being active (Steven?). The only advantage of this approach is that it's 
> in the sleep/wakeup path, rather than the context switch path.

Right, thus avoiding the stores for preemptions.

> Conceptually, the following seems to be the simplest:
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1939,8 +1939,10 @@ static void finish_task_switch(struct rq *rq, 
> struct task_struct *prev)
>          finish_lock_switch(rq, prev);
> 
>          trace_sched_stat_sleeptime(current, rq->clock);
> +#ifdef CONFIG_SCHEDSTAT
>          current->se.statistics.block_start = 0;
>          current->se.statistics.sleep_start = 0;
> +#endif /* CONFIG_SCHEDSTAT */
> 
> Perhaps we can reorder fields in sched_statistics so we touch one 
> cacheline here instead of two?

That might help some, but no stores for preemptions would still be best.

I was thinking something along the lines of the below, since enqueue
uses non-zero to mean its set and we need the values to still be
available in schedule dequeue is the last moment we can clear them.

This would limit the stores to the blocking case, your suggestion of
moving them to the same cacheline will then get us back where we started
in terms of performance.

Or did I miss something?


---
 kernel/sched/fair.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 84adb2d..60f9ab9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1191,6 +1191,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 		if (entity_is_task(se)) {
 			struct task_struct *tsk = task_of(se);
 
+			se->statistics.sleep_start = 0;
+			se->statistics.block_start = 0;
+
 			if (tsk->state & TASK_INTERRUPTIBLE)
 				se->statistics.sleep_start = rq_of(cfs_rq)->clock;
 			if (tsk->state & TASK_UNINTERRUPTIBLE)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/