linux-kernel - Re: Commit 1e5a74059f9 broke utime measurement of ptraced() processes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1324388358.8497.20.camel@marge.simson.net>
Date:	Tue, 20 Dec 2011 14:39:18 +0100
From:	Mike Galbraith <efault@....de>
To:	Артем Анисимов 
	<aanisimov@...ox.ru>
Cc:	linux-kernel@...r.kernel.org, a.p.zijlstra@...llo.nl, mingo@...e.hu
Subject: Re: Commit 1e5a74059f9 broke utime measurement of ptraced()
 processes

On Mon, 2011-12-19 at 16:59 +0100, Артем Анисимов wrote: 
> Hi!
> 
>   I have tried running a program that measures utime of another process in two 
> different setups: the first time when the child process executes with no 
> intervention and the second time when the child is ptrace()d and is stopped at 
> every syscall. It turns out that the overhead imposed by ptrace() is approx. 
> 5% in 2.6.32, but it is 17% in 3.2-rc6. Yet worse is that in 2.6.32 the 
> overhead per system call is roughly constant, but in 3.2-rc6 it varies wildly 
> depending on number of cycles that the child spends in userspace: in some 
> cases increasing "n_cycles_in_userspace / n_syscalls" ratio leads to increase 
> of the overhead.
> 
>   I have bisected this bug down to commit 1e5a74059f9 which changed behaviour 
> of sched.c::check_preempt_curr(). Before 1e5a74059f9 this function would skip 
> the clock update depending on test_tsk_need_resched(p), but now it checks for
> test_tsk_need_resched(rq->curr). Reverting this change makes utime measurement 
> behave as in 2.6.32.

Reverting the change effectively disables the optimization.  What I see
by creating a scheduler feature switch is that without the optimization,
there's a ~60%/40% cpu split between time-when-traced/child when traced,
whereas with the optimization, it shifts to ~55%/45% split.  What you
see is that utilization shift, caused by the optimization transitioning
from busted to working.

Every preempt cuts waker's CPU usage, lessening the odds of it being
preempted again soon, and effectively hands a few wakeup expense cycles
to the wakee that preempted it in the bargain.

No skip optimization

marge:~/tmp # time sh -c 'for i in $(seq 1 5); do ./time-when-traced;done'
utime (untraced): 732545 musecs
utime (traced): 717044 musecs
diff: -21
utime (untraced): 697043 musecs
utime (traced): 702043 musecs
diff: 7
utime (untraced): 737545 musecs
utime (traced): 772547 musecs
diff: 47
utime (untraced): 665041 musecs
utime (traced): 742045 musecs
diff: 115
utime (untraced): 654040 musecs
utime (traced): 695043 musecs
diff: 62

real    2m47.552s
user    1m42.682s
sys     1m4.684s

perf report -d [kernel.kallsyms] --sort=symbol |grep clock
     3.12%  native_sched_clock
     2.52%  sched_clock_local
     0.64%  sched_clock_cpu
     0.57%  update_rq_clock
     0.16%  jiffies_to_clock_t
     0.01%  clockevents_program_event
     0.01%  sched_clock_tick

With optimization

marge:~/tmp # time sh -c 'for i in $(seq 1 5); do ./time-when-traced;done'
utime (untraced): 698043 musecs
utime (traced): 800049 musecs
diff: 146
utime (untraced): 681042 musecs
utime (traced): 787548 musecs
diff: 156
utime (untraced): 699543 musecs
utime (traced): 865053 musecs
diff: 236
utime (untraced): 694543 musecs
utime (traced): 832551 musecs
diff: 198
utime (untraced): 685042 musecs
utime (traced): 816050 musecs
diff: 191

real    2m45.110s
user    1m43.282s
sys     1m1.656s

perf report -d [kernel.kallsyms] --sort=symbol |grep clock
     2.28%  native_sched_clock
     1.80%  sched_clock_local
     0.45%  update_rq_clock
     0.37%  sched_clock_cpu
     0.19%  jiffies_to_clock_t
     0.00%  sched_clock_tick
     0.00%  clockevents_program_even


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/