[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <548F0025.4040203@fb.com>
Date: Mon, 15 Dec 2014 10:37:09 -0500
From: Josef Bacik <jbacik@...com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <bmaurer@...com>, <rkroll@...com>, <kernel-team@...com>,
<mingo@...hat.com>, <linux-kernel@...r.kernel.org>,
<umgwanakikbuti@...il.com>
Subject: Re: [PATCH] sched/fair: change where we report sched stats V2
On 12/15/2014 05:16 AM, Peter Zijlstra wrote:
> On Thu, Dec 11, 2014 at 10:59:55AM -0500, Josef Bacik wrote:
>> The schedule stats currently spit out once the entity is being queued, which
>> means if we have stack traces enabled we will get the stack trace of the waker,
>> not of the task being woken. This makes the backtrace completely useless when
>> trying to track down latency spikes, as we want to know why we were put to sleep
>> for as long as we were.
>>
>> This patch moves the trace_sched_stat* calls to after the task has been put back
>> on the CPU so that the backtrace we get is the one that put us to sleep. Mike
>> Galbraith suggested that I leave the actual stat collection stuff where it is so
>> that we don't add the scheduler latency to the sleep stats, so I've added 3 new
>> fields in the scheduler stats to keep track of the last sleep time so that if
>> any of these are set we know to trigger the trace points. Thanks,
>>
>
> Yeah, so I don't like this, it adds overhead for everyone.
>
Only if SCHEDSTATS is enabled tho, and it's no more overhead in the
SCHEDSTATS case than before. Would it be more acceptable to move the
entire callback under SCHEDSTATS?
> Also, I'm very sure this issue has been raised before, and I'm pretty
> sure the answer back then was why don't you look at the
> trace_sched_switch() site matching this wakeup?
>
This is fine for discrete problems, but when trying to find a random
latency spike in a production workload it's impossible. If I do
trace-cmd record -e sched:sched_switch -T sleep 5
on just one of our random web servers I end up with this
du -h trace.dat
62M trace.dat
thats 62 megs in 5 seconds. I ran the following command for almost 2
hours when searching for a latency spike
trace-cmd record -B latency -e sched:sched_stat_blocked -f \"delay >=
100000\" -T -o /root/latency.dat
and got the following .dat file
du -h latency.dat
48M latency.dat
That right there is a huge reason for this change, so I'd really like to
do whatever it takes to get this fixed. I'd be happy to move the
callback under CONFIG_SCHEDSTATS to reduce any possible overhead you
would be concerned about. Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists