linux-kernel - Re: [PATCH 1/1] sched/cputime: Mitigate performance regression in times()/clock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1471247345.1776.2.camel@suse.cz>
Date:	Mon, 15 Aug 2016 09:49:05 +0200
From:	Giovanni Gherdovich <ggherdovich@...e.cz>
To:	Stanislaw Gruszka <sgruszka@...hat.com>,
	Ingo Molnar <mingo@...nel.org>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Mike Galbraith <mgalbraith@...e.de>,
	linux-kernel@...r.kernel.org,
	Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [PATCH 1/1] sched/cputime: Mitigate performance regression in
 times()/clock_gettime()

Hello Stanislaw,

On Fri, 2016-08-12 at 14:10 +0200, Stanislaw Gruszka wrote:
>
> I measured (partial) revert performance on 4.7 using mmtest instructions
> from Giovanni and also tested some other possible fix (draft version):
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 75f98c5..54fdf6d 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -294,6 +294,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>  	unsigned int seq, nextseq;
>  	unsigned long flags;
>  
> +	(void) task_sched_runtime(tsk);
> +
>  	rcu_read_lock();
>  	/* Attempt a lockless read on the first round. */
>  	nextseq = 0;
> @@ -308,7 +310,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>  			task_cputime(t, &utime, &stime);
>  			times->utime += utime;
>  			times->stime += stime;
> -			times->sum_exec_runtime += task_sched_runtime(t);
> +			times->sum_exec_runtime += t->se.sum_exec_runtime;
>  		}
>  		/* If lockless access failed, take the lock. */
>  		nextseq = 1;
> ---
> mmtest benchmark results are below (full compare-kernels.sh output is in attachment):
> 
> vanila-4.7            revert                prefetch              patch
> 4.74 (  0.00%)        3.04 ( 35.93%)        4.09 ( 13.81%)        1.30 ( 72.59%)
> 5.49 (  0.00%)        5.00 (  8.97%)        5.34 (  2.72%)        1.03 ( 81.16%)
> 6.12 (  0.00%)        4.91 ( 19.73%)        5.97 (  2.40%)        0.90 ( 85.27%)
> 6.68 (  0.00%)        4.90 ( 26.66%)        6.02 (  9.75%)        0.88 ( 86.89%)
> 7.21 (  0.00%)        5.13 ( 28.85%)        6.70 (  7.09%)        0.87 ( 87.91%)
> 7.66 (  0.00%)        5.22 ( 31.80%)        7.17 (  6.39%)        0.92 ( 88.01%)
> 7.91 (  0.00%)        5.36 ( 32.22%)        7.30 (  7.72%)        0.95 ( 87.97%)
> 7.95 (  0.00%)        5.35 ( 32.73%)        7.34 (  7.66%)        1.06 ( 86.66%)
> 8.00 (  0.00%)        5.33 ( 33.31%)        7.38 (  7.73%)        1.13 ( 85.82%)
> 5.61 (  0.00%)        3.55 ( 36.76%)        4.53 ( 19.23%)        2.29 ( 59.28%)
> 5.66 (  0.00%)        4.32 ( 23.79%)        4.75 ( 16.18%)        3.65 ( 35.46%)
> 5.98 (  0.00%)        4.97 ( 16.87%)        5.96 (  0.35%)        3.62 ( 39.40%)
> 6.58 (  0.00%)        4.94 ( 24.93%)        6.04 (  8.32%)        3.63 ( 44.89%)
> 7.19 (  0.00%)        5.18 ( 28.01%)        6.68 (  7.13%)        3.65 ( 49.22%)
> 7.67 (  0.00%)        5.27 ( 31.29%)        7.16 (  6.63%)        3.62 ( 52.76%)
> 7.88 (  0.00%)        5.36 ( 31.98%)        7.28 (  7.58%)        3.65 ( 53.71%)
> 7.99 (  0.00%)        5.39 ( 32.52%)        7.40 (  7.42%)        3.65 ( 54.25%)
> 
> Patch works because we we update sum_exec_runtime on current thread
> what assure we see proper sum_exec_runtime value on different CPUs. I
> tested it with reproducers from commits 6e998916dfe32 and d670ec13178d0,
> patch did not break them. I'm going to run some other test.
> 
> Patch is draft version for early review, task_sched_runtime() will be
> simplified (since it's called only current thread) and possibly split
> into two functions: one that call update_curr() and other that return
> sum_exec_runtime (assure it's consistent on 32 bit arches).
> 
> Stanislaw

Thank you for having a look at this.
Your patch performs very well, even better than the pre-6e998916dfe3
numbers I was aiming for. I confirm your results on my test machine
(Sandy Bridge, 32 cores, 2 NUMA nodes).
I didn't apply on the very latest 4.8-rc but used what I had handy for
comparison (i.e. 4.7-rc7 and the parent of 6e998916dfe3).
As I said, my measurements match yours (my tables follow); looks like
your diff cures the problem while mine cures the symptoms.

clock_gettime():

threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + Stanislaw
                       (pre-6e998916dfe3)
2          3.48        2.23 ( 35.68%)        3.06 ( 11.83%)        1.08 ( 68.81%)
5          3.33        2.83 ( 14.84%)        3.25 (  2.40%)        0.71 ( 78.55%)
8          3.37        2.84 ( 15.80%)        3.26 (  3.30%)        0.56 ( 83.49%)
12         3.32        3.09 (  6.69%)        3.37 ( -1.60%)        0.42 ( 87.28%)
21         4.01        3.14 ( 21.70%)        3.90 (  2.74%)        0.35 ( 91.35%)
30         3.63        3.28 (  9.75%)        3.36 (  7.41%)        0.28 ( 92.23%)
48         3.71        3.02 ( 18.69%)        3.11 ( 16.27%)        0.39 ( 89.39%)
79         3.75        2.88 ( 23.23%)        3.16 ( 15.74%)        0.46 ( 87.76%)
110        3.81        2.95 ( 22.62%)        3.25 ( 14.80%)        0.56 ( 85.41%)
128        3.88        3.05 ( 21.28%)        3.31 ( 14.76%)        0.62 ( 84.10%)

times():

threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + Stanislaw
                       (pre-6e998916dfe3)
2          3.65        2.27 ( 37.94%)        3.25 ( 11.03%)        1.62 ( 55.71%)
5          3.45        2.78 ( 19.34%)        3.17 (  7.92%)        2.33 ( 32.28%)
8          3.52        2.79 ( 20.66%)        3.22 (  8.69%)        2.06 ( 41.44%)
12         3.29        3.02 (  8.33%)        3.36 ( -2.04%)        2.00 ( 39.18%)
21         4.07        3.10 ( 23.86%)        3.92 (  3.78%)        2.07 ( 49.18%)
30         3.87        3.33 ( 13.80%)        3.40 ( 12.17%)        1.89 ( 51.12%)
48         3.79        2.96 ( 21.94%)        3.16 ( 16.61%)        1.69 ( 55.46%)
79         3.88        2.88 ( 25.82%)        3.28 ( 15.42%)        1.60 ( 58.81%)
110        3.90        2.98 ( 23.73%)        3.38 ( 13.35%)        1.73 ( 55.61%)
128        4.00        3.10 ( 22.40%)        3.38 ( 15.45%)        1.66 ( 58.52%)


Regards,
Giovanni