lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 10 Aug 2016 13:26:41 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Giovanni Gherdovich <ggherdovich@...e.cz>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Mike Galbraith <mgalbraith@...e.de>,
	Stanislaw Gruszka <sgruszka@...hat.com>,
	linux-kernel@...r.kernel.org,
	Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [PATCH 1/1] sched/cputime: Mitigate performance regression in
 times()/clock_gettime()


* Giovanni Gherdovich <ggherdovich@...e.cz> wrote:

> Commit 6e998916dfe3 ("sched/cputime: Fix clock_nanosleep()/clock_gettime()
> inconsistency") fixed a problem whereby clock_nanosleep() followed by
> clock_gettime() could allow a task to wake early. It addressed the problem
> by calling the scheduling classes update_curr when the cputimer starts.
> 
> Said change induced a considerable performance regression on the syscalls
> times() and clock_gettimes(CLOCK_PROCESS_CPUTIME_ID). There are some
> debuggers and applications that monitor their own performance that
> accidentally depend on the performance of these specific calls.
> 
> This patch mitigates the performace loss by prefetching data in the CPU
> cache, as stalls due to cache misses appear to be where most time is spent
> in our benchmarks.
> 
> Here are the performance gain of this patch over v4.7-rc7 on a Sandy Bridge
> box with 32 logical cores and 2 NUMA nodes. The test is repeated with a
> variable number of threads, from 2 to 4*num_cpus; the results are in
> seconds and correspond to the average of 10 runs; the percentage gain is
> computed with (before-after)/before so a positive value is an improvement
> (it's faster). The improvement varies between a few percents for 5-20
> threads and more than 10% for 2 or >20 threads.
> 
> pound_clock_gettime:
> 
>     threads       4.7-rc7     patched 4.7-rc7
>     [num]         [secs]      [secs (percent)]
>       2           3.48        3.06 ( 11.83%)
>       5           3.33        3.25 (  2.40%)
>       8           3.37        3.26 (  3.30%)
>      12           3.32        3.37 ( -1.60%)
>      21           4.01        3.90 (  2.74%)
>      30           3.63        3.36 (  7.41%)
>      48           3.71        3.11 ( 16.27%)
>      79           3.75        3.16 ( 15.74%)
>     110           3.81        3.25 ( 14.80%)
>     128           3.88        3.31 ( 14.76%)

Nice detective work! I'm wondering, where do we stand if compared with a 
pre-6e998916dfe3 kernel?

I admit this is a difficult question: 6e998916dfe3 does not revert cleanly and I 
suspect v3.17 does not run easily on a recent distro. Could you attempt to revert 
the bad effects of 6e998916dfe3 perhaps, just to get numbers - i.e. don't try to 
make the result correct, just see what the performance gap is, roughly.

If there's still a significant gap then it might make sense to optimize this some 
more.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ