[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1324388358.8497.20.camel@marge.simson.net>
Date: Tue, 20 Dec 2011 14:39:18 +0100
From: Mike Galbraith <efault@....de>
To: Артем Анисимов
<aanisimov@...ox.ru>
Cc: linux-kernel@...r.kernel.org, a.p.zijlstra@...llo.nl, mingo@...e.hu
Subject: Re: Commit 1e5a74059f9 broke utime measurement of ptraced()
processes
On Mon, 2011-12-19 at 16:59 +0100, Артем Анисимов wrote:
> Hi!
>
> I have tried running a program that measures utime of another process in two
> different setups: the first time when the child process executes with no
> intervention and the second time when the child is ptrace()d and is stopped at
> every syscall. It turns out that the overhead imposed by ptrace() is approx.
> 5% in 2.6.32, but it is 17% in 3.2-rc6. Yet worse is that in 2.6.32 the
> overhead per system call is roughly constant, but in 3.2-rc6 it varies wildly
> depending on number of cycles that the child spends in userspace: in some
> cases increasing "n_cycles_in_userspace / n_syscalls" ratio leads to increase
> of the overhead.
>
> I have bisected this bug down to commit 1e5a74059f9 which changed behaviour
> of sched.c::check_preempt_curr(). Before 1e5a74059f9 this function would skip
> the clock update depending on test_tsk_need_resched(p), but now it checks for
> test_tsk_need_resched(rq->curr). Reverting this change makes utime measurement
> behave as in 2.6.32.
Reverting the change effectively disables the optimization. What I see
by creating a scheduler feature switch is that without the optimization,
there's a ~60%/40% cpu split between time-when-traced/child when traced,
whereas with the optimization, it shifts to ~55%/45% split. What you
see is that utilization shift, caused by the optimization transitioning
from busted to working.
Every preempt cuts waker's CPU usage, lessening the odds of it being
preempted again soon, and effectively hands a few wakeup expense cycles
to the wakee that preempted it in the bargain.
No skip optimization
marge:~/tmp # time sh -c 'for i in $(seq 1 5); do ./time-when-traced;done'
utime (untraced): 732545 musecs
utime (traced): 717044 musecs
diff: -21
utime (untraced): 697043 musecs
utime (traced): 702043 musecs
diff: 7
utime (untraced): 737545 musecs
utime (traced): 772547 musecs
diff: 47
utime (untraced): 665041 musecs
utime (traced): 742045 musecs
diff: 115
utime (untraced): 654040 musecs
utime (traced): 695043 musecs
diff: 62
real 2m47.552s
user 1m42.682s
sys 1m4.684s
perf report -d [kernel.kallsyms] --sort=symbol |grep clock
3.12% native_sched_clock
2.52% sched_clock_local
0.64% sched_clock_cpu
0.57% update_rq_clock
0.16% jiffies_to_clock_t
0.01% clockevents_program_event
0.01% sched_clock_tick
With optimization
marge:~/tmp # time sh -c 'for i in $(seq 1 5); do ./time-when-traced;done'
utime (untraced): 698043 musecs
utime (traced): 800049 musecs
diff: 146
utime (untraced): 681042 musecs
utime (traced): 787548 musecs
diff: 156
utime (untraced): 699543 musecs
utime (traced): 865053 musecs
diff: 236
utime (untraced): 694543 musecs
utime (traced): 832551 musecs
diff: 198
utime (untraced): 685042 musecs
utime (traced): 816050 musecs
diff: 191
real 2m45.110s
user 1m43.282s
sys 1m1.656s
perf report -d [kernel.kallsyms] --sort=symbol |grep clock
2.28% native_sched_clock
1.80% sched_clock_local
0.45% update_rq_clock
0.37% sched_clock_cpu
0.19% jiffies_to_clock_t
0.00% sched_clock_tick
0.00% clockevents_program_even
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists