lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200616110824.dgkkbyapn3io6wik@e107158-lin>
Date:   Tue, 16 Jun 2020 12:08:26 +0100
From:   Qais Yousef <qais.yousef@....com>
To:     Mel Gorman <mgorman@...e.de>
Cc:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Jonathan Corbet <corbet@....net>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Iurii Zaikin <yzaikin@...gle.com>,
        Quentin Perret <qperret@...gle.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Patrick Bellasi <patrick.bellasi@...bug.net>,
        Pavan Kondeti <pkondeti@...eaurora.org>,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, chris.redpath@....com,
        lukasz.luba@....com
Subject: Re: [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default
 boost value

On 06/11/20 11:58, Qais Yousef wrote:

[...]

> 
>                                     nouclam               nouclamp                  uclam                 uclamp         uclamp.disable                 uclamp                 uclamp                 uclamp
>                                    nouclamp              recompile                 uclamp                uclamp2        uclamp.disabled                    opt                   opt2           opt.disabled
> Hmean     send-64         158.07 (   0.00%)      156.99 *  -0.68%*      163.83 *   3.65%*      160.97 *   1.83%*      163.93 *   3.71%*      159.62 *   0.98%*      161.79 *   2.36%*      161.14 *   1.94%*
> Hmean     send-128        314.86 (   0.00%)      314.41 *  -0.14%*      329.05 *   4.51%*      322.88 *   2.55%*      327.88 *   4.14%*      317.56 *   0.86%*      320.72 *   1.86%*      319.62 *   1.51%*
> Hmean     send-256        629.98 (   0.00%)      625.78 *  -0.67%*      652.67 *   3.60%*      639.98 *   1.59%*      643.99 *   2.22%*      631.96 *   0.31%*      635.75 *   0.92%*      644.10 *   2.24%*
> Hmean     send-1024      2465.04 (   0.00%)     2452.29 *  -0.52%*     2554.66 *   3.64%*     2509.60 *   1.81%*     2540.71 *   3.07%*     2495.82 *   1.25%*     2490.50 *   1.03%*     2509.86 *   1.82%*
> Hmean     send-2048      4717.57 (   0.00%)     4713.17 *  -0.09%*     4923.98 *   4.38%*     4811.01 *   1.98%*     4881.87 *   3.48%*     4793.82 *   1.62%*     4820.28 *   2.18%*     4824.60 *   2.27%*
> Hmean     send-3312      7412.33 (   0.00%)     7433.42 *   0.28%*     7717.76 *   4.12%*     7522.97 *   1.49%*     7620.99 *   2.82%*     7522.89 *   1.49%*     7614.51 *   2.73%*     7568.51 *   2.11%*
> Hmean     send-4096      9021.55 (   0.00%)     8988.71 *  -0.36%*     9337.62 *   3.50%*     9075.49 *   0.60%*     9258.34 *   2.62%*     9117.17 *   1.06%*     9175.85 *   1.71%*     9079.50 *   0.64%*
> Hmean     send-8192     15370.36 (   0.00%)    15467.63 *   0.63%*    15999.52 *   4.09%*    15467.80 *   0.63%*    15978.69 *   3.96%*    15619.84 *   1.62%*    15395.09 *   0.16%*    15779.73 *   2.66%*
> Hmean     send-16384    26512.35 (   0.00%)    26498.18 *  -0.05%*    26931.86 *   1.58%*    26513.18 *   0.00%*    26873.98 *   1.36%*    26456.38 *  -0.21%*    26467.77 *  -0.17%*    26975.04 *   1.75%*

I have attempted a few other things after this.

As pointed out above, with 5.7-rc7 I can't see a regression.

The machine I'm testing on is 2 Sockets Xeon E5 2x10-Cores (40 CPUs).

If I switch to 5.6, I can see a drop (performed each run twice)

                                   nouclamp              nouclamp2                 uclamp                uclamp2
Hmean     send-64         162.43 (   0.00%)      161.46 *  -0.60%*      157.84 *  -2.82%*      158.11 *  -2.66%*
Hmean     send-128        324.71 (   0.00%)      323.88 *  -0.25%*      314.78 *  -3.06%*      314.94 *  -3.01%*
Hmean     send-256        641.55 (   0.00%)      640.22 *  -0.21%*      628.67 *  -2.01%*      631.79 *  -1.52%*
Hmean     send-1024      2525.28 (   0.00%)     2520.31 *  -0.20%*     2448.26 *  -3.05%*     2497.15 *  -1.11%*
Hmean     send-2048      4836.14 (   0.00%)     4827.47 *  -0.18%*     4712.08 *  -2.57%*     4757.70 *  -1.62%*
Hmean     send-3312      7540.83 (   0.00%)     7603.14 *   0.83%*     7425.45 *  -1.53%*     7499.87 *  -0.54%*
Hmean     send-4096      9124.53 (   0.00%)     9224.90 *   1.10%*     8948.82 *  -1.93%*     9087.20 *  -0.41%*
Hmean     send-8192     15589.67 (   0.00%)    15768.82 *   1.15%*    15486.35 *  -0.66%*    15594.53 *   0.03%*
Hmean     send-16384    26386.47 (   0.00%)    26683.64 *   1.13%*    25752.25 *  -2.40%*    26609.64 *   0.85%*

If I apply the 2 patches from my previous email, with uclamp enabled I see

                                   nouclamp              nouclamp2             uclamp-opt            uclamp-opt2
Hmean     send-64         162.43 (   0.00%)      161.46 *  -0.60%*      159.84 *  -1.60%*      160.79 *  -1.01%*
Hmean     send-128        324.71 (   0.00%)      323.88 *  -0.25%*      318.44 *  -1.93%*      321.88 *  -0.87%*
Hmean     send-256        641.55 (   0.00%)      640.22 *  -0.21%*      633.54 *  -1.25%*      640.43 *  -0.17%*
Hmean     send-1024      2525.28 (   0.00%)     2520.31 *  -0.20%*     2497.47 *  -1.10%*     2522.00 *  -0.13%*
Hmean     send-2048      4836.14 (   0.00%)     4827.47 *  -0.18%*     4773.63 *  -1.29%*     4825.31 *  -0.22%*
Hmean     send-3312      7540.83 (   0.00%)     7603.14 *   0.83%*     7512.92 *  -0.37%*     7482.66 *  -0.77%*
Hmean     send-4096      9124.53 (   0.00%)     9224.90 *   1.10%*     9076.62 *  -0.52%*     9175.58 *   0.56%*
Hmean     send-8192     15589.67 (   0.00%)    15768.82 *   1.15%*    15466.02 *  -0.79%*    15792.10 *   1.30%*
Hmean     send-16384    26386.47 (   0.00%)    26683.64 *   1.13%*    26234.79 *  -0.57%*    26459.95 *   0.28%*

Which shows that on this machine, the system is slowed down due to bad D$
behavior on access to rq->uclamp[].bucket[] and p->uclamp{_rq}[].

If I disable uclamp using the static key I get

                                   nouclamp              nouclamp2    uclamp-opt.disabled   uclamp-opt.disabled2
Hmean     send-64         162.43 (   0.00%)      161.46 *  -0.60%*      161.21 *  -0.75%*      161.05 *  -0.85%*
Hmean     send-128        324.71 (   0.00%)      323.88 *  -0.25%*      321.09 *  -1.11%*      319.72 *  -1.54%*
Hmean     send-256        641.55 (   0.00%)      640.22 *  -0.21%*      637.37 *  -0.65%*      637.82 *  -0.58%*
Hmean     send-1024      2525.28 (   0.00%)     2520.31 *  -0.20%*     2510.07 *  -0.60%*     2504.99 *  -0.80%*
Hmean     send-2048      4836.14 (   0.00%)     4827.47 *  -0.18%*     4795.29 *  -0.84%*     4788.99 *  -0.97%*
Hmean     send-3312      7540.83 (   0.00%)     7603.14 *   0.83%*     7490.27 *  -0.67%*     7498.56 *  -0.56%*
Hmean     send-4096      9124.53 (   0.00%)     9224.90 *   1.10%*     9108.73 *  -0.17%*     9196.45 *   0.79%*
Hmean     send-8192     15589.67 (   0.00%)    15768.82 *   1.15%*    15649.50 *   0.38%*    16101.68 *   3.28%*
Hmean     send-16384    26386.47 (   0.00%)    26683.64 *   1.13%*    26435.38 *   0.19%*    27199.11 *   3.08%*

I decided after this to see if this failure is observed all the way until
5.7-rc7.

For 5.7-rc1 I get (comparing against 5.6-nouclamp)

                                   nouclamp              nouclamp2                 uclamp                uclamp2
Hmean     send-64         162.43 (   0.00%)      161.46 *  -0.60%*      155.56 *  -4.23%*      156.72 *  -3.52%*
Hmean     send-128        324.71 (   0.00%)      323.88 *  -0.25%*      311.68 *  -4.01%*      312.63 *  -3.72%*
Hmean     send-256        641.55 (   0.00%)      640.22 *  -0.21%*      616.03 *  -3.98%*      620.83 *  -3.23%*
Hmean     send-1024      2525.28 (   0.00%)     2520.31 *  -0.20%*     2441.92 *  -3.30%*     2433.83 *  -3.62%*
Hmean     send-2048      4836.14 (   0.00%)     4827.47 *  -0.18%*     4698.42 *  -2.85%*     4682.22 *  -3.18%*
Hmean     send-3312      7540.83 (   0.00%)     7603.14 *   0.83%*     7379.37 *  -2.14%*     7354.82 *  -2.47%*
Hmean     send-4096      9124.53 (   0.00%)     9224.90 *   1.10%*     8797.21 *  -3.59%*     8815.65 *  -3.39%*
Hmean     send-8192     15589.67 (   0.00%)    15768.82 *   1.15%*    15009.19 *  -3.72%*    15065.16 *  -3.36%*
Hmean     send-16384    26386.47 (   0.00%)    26683.64 *   1.13%*    25829.20 *  -2.11%*    25783.17 *  -2.29%*

For 5.7-rc2, the overhead disappears again (against 5.6-nouclamp)

                                   nouclamp              nouclamp2                 uclamp                uclamp2
Hmean     send-64         162.43 (   0.00%)      161.46 *  -0.60%*      162.97 *   0.34%*      163.31 *   0.54%*
Hmean     send-128        324.71 (   0.00%)      323.88 *  -0.25%*      323.94 *  -0.24%*      325.74 *   0.32%*
Hmean     send-256        641.55 (   0.00%)      640.22 *  -0.21%*      641.82 *   0.04%*      645.11 *   0.56%*
Hmean     send-1024      2525.28 (   0.00%)     2520.31 *  -0.20%*     2522.74 *  -0.10%*     2535.63 *   0.41%*
Hmean     send-2048      4836.14 (   0.00%)     4827.47 *  -0.18%*     4836.74 *   0.01%*     4838.62 *   0.05%*
Hmean     send-3312      7540.83 (   0.00%)     7603.14 *   0.83%*     7635.31 *   1.25%*     7613.91 *   0.97%*
Hmean     send-4096      9124.53 (   0.00%)     9224.90 *   1.10%*     9198.58 *   0.81%*     9161.53 *   0.41%*
Hmean     send-8192     15589.67 (   0.00%)    15768.82 *   1.15%*    15804.47 *   1.38%*    15755.91 *   1.07%*
Hmean     send-16384    26386.47 (   0.00%)    26683.64 *   1.13%*    26649.29 *   1.00%*    26677.46 *   1.10%*

I stopped here tbh. I thought maybe numa scheduling is making the uclamp
accesses more expensive in certain patterns, so I tried with numactl -N 0
(using 5.7-rc1)

                                   nouclamp              nouclamp2            uclamp-N0-1            uclamp-N0-2
Hmean     send-64         162.43 (   0.00%)      161.46 *  -0.60%*      156.26 *  -3.80%*      156.00 *  -3.96%*
Hmean     send-128        324.71 (   0.00%)      323.88 *  -0.25%*      312.20 *  -3.85%*      312.94 *  -3.63%*
Hmean     send-256        641.55 (   0.00%)      640.22 *  -0.21%*      620.29 *  -3.31%*      619.25 *  -3.48%*
Hmean     send-1024      2525.28 (   0.00%)     2520.31 *  -0.20%*     2437.59 *  -3.47%*     2433.94 *  -3.62%*
Hmean     send-2048      4836.14 (   0.00%)     4827.47 *  -0.18%*     4671.28 *  -3.41%*     4714.49 *  -2.52%*
Hmean     send-3312      7540.83 (   0.00%)     7603.14 *   0.83%*     7355.86 *  -2.45%*     7387.51 *  -2.03%*
Hmean     send-4096      9124.53 (   0.00%)     9224.90 *   1.10%*     8793.02 *  -3.63%*     8883.88 *  -2.64%*
Hmean     send-8192     15589.67 (   0.00%)    15768.82 *   1.15%*    14898.76 *  -4.43%*    14958.19 *  -4.05%*
Hmean     send-16384    26386.47 (   0.00%)    26683.64 *   1.13%*    25745.40 *  -2.43%*    25800.01 *  -2.22%*

And it had no effect. Interesting Lukasz can see an improvement if he tries
something similar on his machine.

Did we have any previous history of code/data layout affecting the performance
of the hot path in the past? On the juno board (octa core big.LITTLE arm
paltform), I could make the overhead disappear with a simple code shuffle (for
perf bench sched pipe).

I have tried putting the rq->uclamp[].bucket[] structures into their own PERCPU
variable since the rq is read by many cpus and thought that might lead to bad
cache patterns since uclamp are mostly read by the owning cpus, but no luck
with this approach.

I am working on a proper static key patch now that disables uclamp by default
and only enables it if the userspace attemps to modify any of the knobs it
provides, then we switch it on and keep it on. Testing it at the moment.

Thanks

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ