lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDgVT8mGhDbh9Z40769Ju1DMFpL+zu+rEqnYyJRYetmfg@mail.gmail.com>
Date:   Mon, 28 Nov 2022 18:19:32 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     K Prateek Nayak <kprateek.nayak@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, parth@...ux.ibm.com,
        qyousef@...alina.io, chris.hyser@...cle.com,
        patrick.bellasi@...bug.net, David.Laight@...lab.com,
        pjt@...gle.com, pavel@....cz, tj@...nel.org, qperret@...gle.com,
        tim.c.chen@...ux.intel.com, joshdon@...gle.com, timj@....org,
        yu.c.chen@...el.com, youssefesmat@...omium.org,
        joel@...lfernandes.org
Subject: Re: [PATCH v9 0/9] Add latency priority for CFS class

Hi Prateek,

On Mon, 28 Nov 2022 at 12:52, K Prateek Nayak <kprateek.nayak@....com> wrote:
>
> Hello Vincent,
>
> Following are the test results on dual socket Zen3 machine (2 x 64C/128T)
>
> tl;dr
>
> o All benchmarks with DEFAULT_LATENCY_NICE value are comparable to tip.
>   There is, however, a noticeable dip for unixbench-spawn test case.
>
> o With the 2 rbtree approach, I do not see much difference in the
>   hackbench results with varying latency nice value. Tests on v5 did
>   yield noticeable improvements for hackbench.
>   (https://lore.kernel.org/lkml/cd48ebbb-9724-985f-28e3-e558dea07827@amd.com/)

The 2 rbtree approach is the one that was already used in v5. I just
rerun hackbench tests with latest tip and v6.2-rc7 and I can see large
performance improvement for pipe tests on my system (8 cores system).
Could you try witha larger number of group ? like 64, 128 and 256
groups

>
> o For hackbench + cyclictest and hackbench + schbench, I see the
>   expected behavior with different latency nice values.
>
> o There are a few cases with hackbench and hackbench + cyclictest where
>   the results are non-monotonic with different latency nice values.
>   (Marked with "^").
>
> I'll leave the detailed results below:
>
> On 11/15/2022 10:48 PM, Vincent Guittot wrote:
> > This patchset restarts the work about adding a latency priority to describe
> > the latency tolerance of cfs tasks.
> >
> > Patch [1] is a new one that has been added with v6. It fixes an
> > unfairness for low prio tasks because of wakeup_gran() being bigger
> > than the maximum vruntime credit that a waking task can keep after
> > sleeping.
> >
> > The patches [2-4] have been done by Parth:
> > https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com/
> >
> > I have just rebased and moved the set of latency priority outside the
> > priority update. I have removed the reviewed tag because the patches
> > are 2 years old.
> >
> > This aims to be a generic interface and the following patches is one use
> > of it to improve the scheduling latency of cfs tasks.
> >
> > Patch [5] uses latency nice priority to define a latency offset
> > and then decide if a cfs task can or should preempt the current
> > running task. The patch gives some tests results with cyclictests and
> > hackbench to highlight the benefit of latency priority for short
> > interactive task or long intensive tasks.
> >
> > Patch [6] adds the support of latency nice priority to task group by
> > adding a cpu.latency.nice field. The range is [-20:19] as for setting task
> > latency priority.
> >
> > Patch [7] makes sched_core taking into account the latency offset.
> >
> > Patch [8] adds a rb tree to cover some corner cases where the latency
> > sensitive task (priority < 0) is preempted by high priority task (RT/DL)
> > or fails to preempt them. This patch ensures that tasks will have at least
> > a slice of sched_min_granularity in priority at wakeup.
> >
> > Patch [9] removes useless check after adding a latency rb tree.
> >
> > I have also backported the patchset on a dragonboard RB3 with an android
> > mainline kernel based on v5.18 for a quick test. I have used the
> > TouchLatency app which is part of AOSP and described to be a very good
> > test to highlight jitter and jank frame sources of a system [1].
> > In addition to the app, I have added some short running tasks waking-up
> > regularly (to use the 8 cpus for 4 ms every 37777us) to stress the system
> > without overloading it (and disabling EAS). The 1st results shows that the
> > patchset helps to reduce the missed deadline frames from 5% to less than
> > 0.1% when the cpu.latency.nice of task group are set. I haven't rerun the
> > test with latest version.
> >
> > I have also tested the patchset with the modified version of the alsa
> > latency test that has been shared by Tim. The test quickly xruns with
> > default latency nice priority 0 but is able to run without underuns with
> > a latency -20 and hackbench running simultaneously.
> >
> > While preparing the version 8, I have evaluated the benefit of using an
> > augmented rbtree instead of adding a rbtree for latency sensitive entities,
> > which was a relevant suggestion done by PeterZ. Although the augmented
> > rbtree enables to sort additional information in the tree with a limited
> > overhead, it has more impact on legacy use cases (latency_nice >= 0)
> > because the augmented callbacks are always called to maintain this
> > additional information even when there is no sensitive tasks. In such
> > cases, the dedicated rbtree remains empty and the overhead is reduced to
> > loading a cached null node pointer. Nevertheless, we might want to
> > reconsider the augmented rbtree once the use of negative latency_nice will
> > be more widlely deployed. At now, the different tests that I have done,
> > have not shown improvements with augmented rbtree.
> >
> > Below are some hackbench results:
> >         2 rbtrees               augmented rbtree        augmented rbtree
> >                                 sorted by vruntime      sorted by wakeup_vruntime
> > sched pipe
> > avg     26311,000               25976,667               25839,556
> > stdev   0,15 %                  0,28 %                  0,24 %
> > vs tip  0,50 %                  -0,78 %                 -1,31 %
> > hackbench     1 group
> > avg     1,315                   1,344                   1,359
> > stdev   0,88 %                  1,55 %                  1,82 %
> > vs tip  -0,47 %                 -2,68 %                 -3,87 %
> > hackbench     4 groups
> > avg     1,339                   1,365                   1,367
> > stdev   2,39 %                  2,26 %                  3,58 %
> > vs tip  -0,08 %                 -2,01 %                 -2,22 %
> > hackbench     8 groups
> > avg     1,233                   1,286                   1,301
> > stdev   0,74 %                  1,09 %                  1,52 %
> > vs tip  0,29 %                  -4,05 %                 -5,27 %
> > hackbench     16 groups
> > avg     1,268                   1,313                   1,319
> > stdev   0,85 %                  1,60 %                  0,68 %
> > vs tip  -0,02 %                 -3,56 %                 -4,01 %
>
> Following are the results from running standard benchmarks on a
> dual socket Zen3 (2 x 64C/128T) machine configured in different
> NPS modes.
>
> NPS Modes are used to logically divide single socket into
> multiple NUMA region.
> Following is the NUMA configuration for each NPS mode on the system:
>
> NPS1: Each socket is a NUMA node.
>     Total 2 NUMA nodes in the dual socket machine.
>
>     Node 0: 0-63,   128-191
>     Node 1: 64-127, 192-255
>
> NPS2: Each socket is further logically divided into 2 NUMA regions.
>     Total 4 NUMA nodes exist over 2 socket.
>
>     Node 0: 0-31,   128-159
>     Node 1: 32-63,  160-191
>     Node 2: 64-95,  192-223
>     Node 3: 96-127, 223-255
>
> NPS4: Each socket is logically divided into 4 NUMA regions.
>     Total 8 NUMA nodes exist over 2 socket.
>
>     Node 0: 0-15,    128-143
>     Node 1: 16-31,   144-159
>     Node 2: 32-47,   160-175
>     Node 3: 48-63,   176-191
>     Node 4: 64-79,   192-207
>     Node 5: 80-95,   208-223
>     Node 6: 96-111,  223-231
>     Node 7: 112-127, 232-255
>
> Benchmark Results:
>
> Kernel versions:
> - tip:          6.1.0 tip sched/core
> - latency_nice: 6.1.0 tip sched/core + this series
>
> When we started testing, the tip was at:
> commit d6962c4fe8f9 "sched: Clear ttwu_pending after enqueue_task()"
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ hackbench - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NPS1
>
> Test:                   tip                     latency_nice
>  1-groups:         4.25 (0.00 pct)         4.14 (2.58 pct)
>  2-groups:         4.95 (0.00 pct)         4.92 (0.60 pct)
>  4-groups:         5.19 (0.00 pct)         5.18 (0.19 pct)
>  8-groups:         5.45 (0.00 pct)         5.44 (0.18 pct)
> 16-groups:         7.33 (0.00 pct)         7.32 (0.13 pct)
>
> NPS2
>
> Test:                   tip                     latency_nice
>  1-groups:         4.09 (0.00 pct)         4.08 (0.24 pct)
>  2-groups:         4.68 (0.00 pct)         4.72 (-0.85 pct)
>  4-groups:         5.05 (0.00 pct)         4.97 (1.58 pct)
>  8-groups:         5.37 (0.00 pct)         5.34 (0.55 pct)
> 16-groups:         6.69 (0.00 pct)         6.74 (-0.74 pct)
>
> NPS4
>
> Test:                   tip                     latency_nice
>  1-groups:         4.28 (0.00 pct)         4.35 (-1.63 pct)
>  2-groups:         4.78 (0.00 pct)         4.76 (0.41 pct)
>  4-groups:         5.11 (0.00 pct)         5.06 (0.97 pct)
>  8-groups:         5.48 (0.00 pct)         5.40 (1.45 pct)
> 16-groups:         7.07 (0.00 pct)         6.70 (5.23 pct)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ schbench - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NPS1
>
> #workers:       tip                     latency_nice
>   1:      31.00 (0.00 pct)        32.00 (-3.22 pct)
>   2:      33.00 (0.00 pct)        34.00 (-3.03 pct)
>   4:      39.00 (0.00 pct)        38.00 (2.56 pct)
>   8:      45.00 (0.00 pct)        46.00 (-2.22 pct)
>  16:      61.00 (0.00 pct)        66.00 (-8.19 pct)
>  32:     108.00 (0.00 pct)       110.00 (-1.85 pct)
>  64:     212.00 (0.00 pct)       216.00 (-1.88 pct)
> 128:     475.00 (0.00 pct)       701.00 (-47.57 pct)    *
> 128:     429.00 (0.00 pct)       441.00 (-2.79 pct)      [Verification Run]
> 256:     44736.00 (0.00 pct)     45632.00 (-2.00 pct)
> 512:     77184.00 (0.00 pct)     78720.00 (-1.99 pct)
>
> NPS2
>
> #workers:       tip                     latency_nice
>   1:      28.00 (0.00 pct)        33.00 (-17.85 pct)
>   2:      34.00 (0.00 pct)        31.00 (8.82 pct)
>   4:      36.00 (0.00 pct)        36.00 (0.00 pct)
>   8:      51.00 (0.00 pct)        49.00 (3.92 pct)
>  16:      68.00 (0.00 pct)        64.00 (5.88 pct)
>  32:     113.00 (0.00 pct)       115.00 (-1.76 pct)
>  64:     221.00 (0.00 pct)       219.00 (0.90 pct)
> 128:     553.00 (0.00 pct)       531.00 (3.97 pct)
> 256:     43840.00 (0.00 pct)     48192.00 (-9.92 pct)   *
> 256:     50427.00 (0.00 pct)     48351.00 (4.11 pct)    [Verification Run]
> 512:     76672.00 (0.00 pct)     81024.00 (-5.67 pct)
>
> NPS4
>
> #workers:       tip                     latency_nice
>   1:      33.00 (0.00 pct)        28.00 (15.15 pct)
>   2:      29.00 (0.00 pct)        34.00 (-17.24 pct)
>   4:      39.00 (0.00 pct)        36.00 (7.69 pct)
>   8:      58.00 (0.00 pct)        55.00 (5.17 pct)
>  16:      66.00 (0.00 pct)        67.00 (-1.51 pct)
>  32:     112.00 (0.00 pct)       116.00 (-3.57 pct)
>  64:     215.00 (0.00 pct)       213.00 (0.93 pct)
> 128:     689.00 (0.00 pct)       571.00 (17.12 pct)
> 256:     45120.00 (0.00 pct)     46400.00 (-2.83 pct)
> 512:     77440.00 (0.00 pct)     76160.00 (1.65 pct)
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ tbench - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NPS1
>
> Clients:        tip                     latency_nice
>     1    581.75 (0.00 pct)       586.52 (0.81 pct)
>     2    1145.75 (0.00 pct)      1160.69 (1.30 pct)
>     4    2127.94 (0.00 pct)      2141.49 (0.63 pct)
>     8    3838.27 (0.00 pct)      3721.10 (-3.05 pct)
>    16    6272.71 (0.00 pct)      6539.82 (4.25 pct)
>    32    11400.12 (0.00 pct)     12079.49 (5.95 pct)
>    64    21605.96 (0.00 pct)     22908.83 (6.03 pct)
>   128    30715.43 (0.00 pct)     31736.95 (3.32 pct)
>   256    55580.78 (0.00 pct)     54786.29 (-1.42 pct)
>   512    56528.79 (0.00 pct)     56453.54 (-0.13 pct)
>  1024    56520.40 (0.00 pct)     56369.93 (-0.26 pct)
>
> NPS2
>
> Clients:        tip                     latency_nice
>     1    584.13 (0.00 pct)       582.53 (-0.27 pct)
>     2    1153.63 (0.00 pct)      1140.27 (-1.15 pct)
>     4    2212.89 (0.00 pct)      2159.49 (-2.41 pct)
>     8    3871.35 (0.00 pct)      3840.77 (-0.78 pct)
>    16    6216.72 (0.00 pct)      6437.98 (3.55 pct)
>    32    11766.98 (0.00 pct)     11663.53 (-0.87 pct)
>    64    22000.93 (0.00 pct)     21882.88 (-0.53 pct)
>   128    31520.53 (0.00 pct)     31147.05 (-1.18 pct)
>   256    51420.11 (0.00 pct)     55216.39 (7.38 pct)
>   512    53935.90 (0.00 pct)     55407.60 (2.72 pct)
>  1024    55239.73 (0.00 pct)     55997.25 (1.37 pct)
>
> NPS4
>
> Clients:        tip                     latency_nice
>     1    585.83 (0.00 pct)       578.17 (-1.30 pct)
>     2    1141.59 (0.00 pct)      1131.14 (-0.91 pct)
>     4    2174.79 (0.00 pct)      2086.52 (-4.05 pct)
>     8    3887.56 (0.00 pct)      3778.47 (-2.80 pct)
>    16    6441.59 (0.00 pct)      6364.30 (-1.19 pct)
>    32    12133.60 (0.00 pct)     11465.26 (-5.50 pct)   *
>    32    11677.16 (0.00 pct)     12662.09 (8.43 pct)    [Verification Run]
>    64    21769.15 (0.00 pct)     19488.45 (-10.47 pct)  *
>    64    20305.64 (0.00 pct)     21002.90 (3.43 pct)    [Verification Run]
>   128    31396.31 (0.00 pct)     31177.37 (-0.69 pct)
>   256    52792.39 (0.00 pct)     52890.41 (0.18 pct)
>   512    55315.44 (0.00 pct)     53572.65 (-3.15 pct)
>  1024    52150.27 (0.00 pct)     54079.48 (3.69 pct)
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ stream - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NPS1
>
> 10 Runs:
>
> Test:           tip                     latency_nice
>  Copy:   307827.79 (0.00 pct)    330524.48 (7.37 pct)
> Scale:   208872.28 (0.00 pct)    215002.06 (2.93 pct)
>   Add:   239404.64 (0.00 pct)    230334.74 (-3.78 pct)
> Triad:   247258.30 (0.00 pct)    238505.06 (-3.54 pct)
>
> 100 Runs:
>
> Test:           tip                     latency_nice
>  Copy:   317217.55 (0.00 pct)    314467.62 (-0.86 pct)
> Scale:   208740.82 (0.00 pct)    210452.00 (0.81 pct)
>   Add:   240550.63 (0.00 pct)    232376.03 (-3.39 pct)
> Triad:   249594.21 (0.00 pct)    242460.83 (-2.85 pct)
>
> NPS2
>
> 10 Runs:
>
> Test:           tip                     latency_nice
>  Copy:   340877.18 (0.00 pct)    339441.26 (-0.42 pct)
> Scale:   217318.16 (0.00 pct)    216905.49 (-0.18 pct)
>   Add:   259078.93 (0.00 pct)    261686.67 (1.00 pct)
> Triad:   274500.78 (0.00 pct)    271699.83 (-1.02 pct)
>
> 100 Runs:
>
> Test:           tip                     latency_nice
>  Copy:   341860.73 (0.00 pct)    335826.36 (-1.76 pct)
> Scale:   218043.00 (0.00 pct)    216451.84 (-0.72 pct)
>   Add:   253698.22 (0.00 pct)    257317.72 (1.42 pct)
> Triad:   265011.84 (0.00 pct)    267769.93 (1.04 pct)
>
> NPS4
>
> 10 Runs:
>
> Test:           tip                     latency_nice
>  Copy:   340877.18 (0.00 pct)    365921.51 (7.34 pct)
> Scale:   217318.16 (0.00 pct)    239408.65 (10.16 pct)
>   Add:   259078.93 (0.00 pct)    264859.31 (2.23 pct)
> Triad:   274500.78 (0.00 pct)    281543.65 (2.56 pct)
>
> 100 Runs:
>
> Test:           tip                     latency_nice
>  Copy:   341860.73 (0.00 pct)    359255.16 (5.08 pct)
> Scale:   218043.00 (0.00 pct)    238154.15 (9.22 pct)
>   Add:   253698.22 (0.00 pct)    269223.49 (6.11 pct)
> Triad:   265011.84 (0.00 pct)    278473.85 (5.07 pct)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ ycsb-mongodb - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> o NPS1
>
> tip:                    131244.00 (var: 2.67%)
> latency_nice:           132118.00 (var: 3.62%) (+0.66%)
>
> o NPS2
>
> tip:                    127663.33 (var: 2.08%)
> latency_nice:           129148.00 (var: 4.29%) (+1.16%)
>
> o NPS4
>
> tip:                    133295.00 (var: 1.58%)
> latency_nice:           129975.33 (var: 1.10%) (-2.49%)
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ Unixbench - DEFAULT_LATENCY_NICE ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> o NPS1
>
> Test                    Metric    Parallelism                   tip                   latency_nice
> unixbench-dhry2reg      Hmean     unixbench-dhry2reg-1      48929419.48 (   0.00%)    49137039.06 (   0.42%)
> unixbench-dhry2reg      Hmean     unixbench-dhry2reg-512  6275526953.25 (   0.00%)  6265580479.15 (  -0.16%)
> unixbench-syscall       Amean     unixbench-syscall-1        2994319.73 (   0.00%)     3008596.83 *  -0.48%*
> unixbench-syscall       Amean     unixbench-syscall-512      7349715.87 (   0.00%)     7420994.50 *  -0.97%*
> unixbench-pipe          Hmean     unixbench-pipe-1           2830206.03 (   0.00%)     2854405.99 *   0.86%*
> unixbench-pipe          Hmean     unixbench-pipe-512       326207828.01 (   0.00%)   328997804.52 *   0.86%*
> unixbench-spawn         Hmean     unixbench-spawn-1             6394.21 (   0.00%)        6367.75 (  -0.41%)
> unixbench-spawn         Hmean     unixbench-spawn-512          72700.64 (   0.00%)       71454.19 *  -1.71%*
> unixbench-execl         Hmean     unixbench-execl-1             4723.61 (   0.00%)        4750.59 (   0.57%)
> unixbench-execl         Hmean     unixbench-execl-512          11212.05 (   0.00%)       11262.13 (   0.45%)
>
> o NPS2
>
> Test                    Metric    Parallelism                   tip                   latency_nice
> unixbench-dhry2reg      Hmean     unixbench-dhry2reg-1      49271512.85 (   0.00%)    49245260.43 (  -0.05%)
> unixbench-dhry2reg      Hmean     unixbench-dhry2reg-512  6267992483.03 (   0.00%)  6264951100.67 (  -0.05%)
> unixbench-syscall       Amean     unixbench-syscall-1        2995885.93 (   0.00%)     3005975.10 *  -0.34%*
> unixbench-syscall       Amean     unixbench-syscall-512      7388865.77 (   0.00%)     7276275.63 *   1.52%*
> unixbench-pipe          Hmean     unixbench-pipe-1           2828971.95 (   0.00%)     2856578.72 *   0.98%*
> unixbench-pipe          Hmean     unixbench-pipe-512       326225385.37 (   0.00%)   328941270.81 *   0.83%*
> unixbench-spawn         Hmean     unixbench-spawn-1             6958.71 (   0.00%)        6954.21 (  -0.06%)
> unixbench-spawn         Hmean     unixbench-spawn-512          85443.56 (   0.00%)       70536.42 * -17.45%* (0.67% vs 0.93% - CoEff var)

I don't expect any perf improvement or regression when the latency
nice is not changed

> unixbench-execl         Hmean     unixbench-execl-1             4767.99 (   0.00%)        4752.63 *  -0.32%*
> unixbench-execl         Hmean     unixbench-execl-512          11250.72 (   0.00%)       11320.97 (   0.62%)
>
> o NPS4
>
> Test                    Metric    Parallelism                   tip                   latency_nice
> unixbench-dhry2reg      Hmean     unixbench-dhry2reg-1      49041932.68 (   0.00%)    49156671.05 (   0.23%)
> unixbench-dhry2reg      Hmean     unixbench-dhry2reg-512  6286981589.85 (   0.00%)  6285248711.40 (  -0.03%)
> unixbench-syscall       Amean     unixbench-syscall-1        2992405.60 (   0.00%)     3008933.03 *  -0.55%*
> unixbench-syscall       Amean     unixbench-syscall-512      7971789.70 (   0.00%)     7814622.23 *   1.97%*
> unixbench-pipe          Hmean     unixbench-pipe-1           2822892.54 (   0.00%)     2852615.11 *   1.05%*
> unixbench-pipe          Hmean     unixbench-pipe-512       326408309.83 (   0.00%)   329617202.56 *   0.98%*
> unixbench-spawn         Hmean     unixbench-spawn-1             7685.31 (   0.00%)        7243.54 (  -5.75%)
> unixbench-spawn         Hmean     unixbench-spawn-512          72245.56 (   0.00%)       77000.81 *   6.58%*
> unixbench-execl         Hmean     unixbench-execl-1             4761.42 (   0.00%)        4733.12 *  -0.59%*
> unixbench-execl         Hmean     unixbench-execl-512          11533.53 (   0.00%)       11660.17 (   1.10%)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ Hackbench - Various Latency Nice Values ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> o 100000 loops
>
> - pipe (process)
>
> Test:                   LN: 0                   LN: 19                  LN: -20
>  1-groups:         3.91 (0.00 pct)         3.91 (0.00 pct)         3.81 (2.55 pct)
>  2-groups:         4.48 (0.00 pct)         4.52 (-0.89 pct)        4.53 (-1.11 pct)
>  4-groups:         4.83 (0.00 pct)         4.83 (0.00 pct)         4.87 (-0.82 pct)
>  8-groups:         5.09 (0.00 pct)         5.00 (1.76 pct)         5.07 (0.39 pct)
> 16-groups:         6.92 (0.00 pct)         6.79 (1.87 pct)         6.96 (-0.57 pct)
>
> - pipe (thread)
>
>  1-groups:         4.13 (0.00 pct)         4.08 (1.21 pct)         4.11 (0.48 pct)
>  2-groups:         4.78 (0.00 pct)         4.90 (-2.51 pct)        4.79 (-0.20 pct)
>  4-groups:         5.12 (0.00 pct)         5.08 (0.78 pct)         5.16 (-0.78 pct)
>  8-groups:         5.31 (0.00 pct)         5.28 (0.56 pct)         5.33 (-0.37 pct)
> 16-groups:         7.34 (0.00 pct)         7.27 (0.95 pct)         7.33 (0.13 pct)
>
> - socket (process)
>
> Test:                   LN: 0                   LN: 19                  LN: -20
>  1-groups:         6.61 (0.00 pct)         6.38 (3.47 pct)         6.54 (1.05 pct)
>  2-groups:         6.59 (0.00 pct)         6.67 (-1.21 pct)        6.11 (7.28 pct)
>  4-groups:         6.77 (0.00 pct)         6.78 (-0.14 pct)        6.79 (-0.29 pct)
>  8-groups:         8.29 (0.00 pct)         8.39 (-1.20 pct)        8.36 (-0.84 pct)
> 16-groups:        12.21 (0.00 pct)        12.03 (1.47 pct)        12.35 (-1.14 pct)
>
> - socket (thread)
>
> Test:                   LN: 0                   LN: 19                  LN: -20
>  1-groups:         6.50 (0.00 pct)         5.99 (7.84 pct)         6.02 (7.38 pct)      ^
>  2-groups:         6.07 (0.00 pct)         6.20 (-2.14 pct)        6.23 (-2.63 pct)
>  4-groups:         6.61 (0.00 pct)         6.64 (-0.45 pct)        6.63 (-0.30 pct)
>  8-groups:         8.87 (0.00 pct)         8.67 (2.25 pct)         8.78 (1.01 pct)
> 16-groups:        12.63 (0.00 pct)        12.54 (0.71 pct)        12.59 (0.31 pct)
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ Hackbench + Cyclictest - Various Latency Nice Values ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> - Hackbench: 32 Groups
>
> perf bench sched messaging -p -l 100000 -g 32&
> cyclictest --policy other -D 5 -q -n -h 2000
>
> o NPS1
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench   |      Cyclictest LN = 19      |      Cyclictest LN = 0        |    Cyclictest LN = -20    |
> | LN          |------------------------------|-------------------------------|---------------------------|
> |             |   Min  |   Avg   |  Max      |   Min  |   Avg   |   Max      |   Min  |  Avg  |   Max    |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 19          | 52.00  | 71.00   | 5191.00   | 29.00  | 68.00   |  4477.00   | 53.00  | 60.00 |  753.00  |
> | 0           | 53.00  | 150.00  | 7300.00   | 53.00  | 105.00  |  7730.00   | 53.00  | 64.00 |  2067.00 |
> | -20         | 33.00  | 159.00  | 98492.00  | 53.00  | 149.00  |  9608.00   | 53.00  | 91.00 |  5349.00 |
> ----------------------------------------------------------------------------------------------------------
>
> o NPS4
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench   |      Cyclictest LN = 19      |      Cyclictest LN = 0        |    Cyclictest LN = -20    |
> | LN          |------------------------------|-------------------------------|---------------------------|
> |             |   Min  |   Avg   |  Max      |   Min  |   Avg   |   Max      |   Min  |  Avg  |   Max    |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 19          | 53.00  |  84.00  |  4790.00  | 53.00  |  72.00  |  3456.00   | 53.00  | 58.00 |  1271.00 |
> | 0           | 53.00  |  99.00  |  5494.00  | 52.00  |  74.00  |  5813.00   | 53.00  | 59.00 |  1004.00 |
> | -20         | 45.00  |  84.00  |  3592.00  | 53.00  |  91.00  |  15222.00  | 53.00  | 74.00 |  5232.00 |      ^
> ----------------------------------------------------------------------------------------------------------
>
> - Hackbench: 128 Groups
>
> perf bench sched messaging -p -l 500000 -g 128&
> cyclictest --policy other -D 5 -q -n -h 2000
>
> o NPS1
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench   |      Cyclictest LN = 19      |      Cyclictest LN = 0        |    Cyclictest LN = -20    |
> | LN          |------------------------------|-------------------------------|---------------------------|
> |             |   Min  |   Avg   |  Max      |   Min  |   Avg   |   Max      |   Min  |  Avg  |   Max    |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 19          | 53.00  | 274.00  | 11294.00  | 33.00  | 130.00  |  20071.00  | 53.00  | 56.00 |  244.00  |      ^
> | 0           | 53.00  | 125.00  | 10014.00  | 53.00  | 113.00  |  15857.00  | 53.00  | 57.00 |  250.00  |
> | -20         | 53.00  | 187.00  | 49565.00  | 53.00  | 230.00  |  73353.00  | 53.00  | 118.00|  8816.00 |
> ----------------------------------------------------------------------------------------------------------
>
> o NPS4
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench   |      Cyclictest LN = 19      |      Cyclictest LN = 0        |    Cyclictest LN = -20    |
> | LN          |------------------------------|-------------------------------|---------------------------|
> |             |   Min  |   Avg   |  Max      |   Min  |   Avg   |   Max      |   Min  |  Avg  |   Max    |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 19          | 53.00  | 271.00  | 11411.00  | 53.00  | 82.00   |  5486.00   | 25.00  | 57.00 | 1256.00  |
> | 0           | 53.00  | 148.00  | 8374.00   | 52.00  | 109.00  |  11074.00  | 52.00  | 59.00 | 1068.00  |
> | -20         | 53.00  | 202.00  | 52537.00  | 53.00  | 205.00  |  22265.00  | 52.00  | 87.00 | 14151.00 |
> ----------------------------------------------------------------------------------------------------------
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ Hackbench + schbench - Various Latency Nice Values ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> perf bench sched messaging -p -l 400000 -g 128
> schbench -m 2 -t 1 -s 30
>
> o NPS1
>
> -------------------------------------------------------------------------------------------------
> | Hackbench |     schbench LN = 19       |      schbench LN = 0      |     schbench LN = -20    |
> | LN        |----------------------------|---------------------------|--------------------------|
> |           |  90th  |  95th  |  99th    |  90th  |  95th  |  99th   |  90th  |  95th  |  99th  |
> |-----------|--------|--------|----------|--------|--------|---------|--------|--------|--------|
> | 19        |   38   |   131  |   1458   |   46   |   151  |  2636   |   11   |   19   |  410   |       ^
> | 0         |   45   |   98   |   1758   |   25   |   50   |  1670   |   16   |   30   |  1042  |
> | -20       |   47   |   348  |   29280  |   40   |   109  |  16144  |   35   |   63   |  9104  |
> -------------------------------------------------------------------------------------------------
>
> o NPS4
>
> -------------------------------------------------------------------------------------------------
> | Hackbench |     schbench LN = 19       |      schbench LN = 0      |     schbench LN = -20    |
> | LN        |----------------------------|---------------------------|--------------------------|
> |           |  90th  |  95th  |  99th    |  90th  |  95th  |  99th   |  90th  |  95th  |  99th  |
> |-----------|--------|--------|----------|--------|--------|---------|--------|--------|--------|
> | 19        |   19   |  60    |  1886    |   17   |  29    |  621    |   10   |   18   |  227   |
> | 0         |   51   |  141   |  8120    |   37   |  78    |  8880   |   33   |   55   |  474   |       ^
> | -20       |   48   |  1494  |  27296   |   51   |  469   |  40384  |   31   |   64   |  4092  |       ^
> -------------------------------------------------------------------------------------------------
>
> ^ Note: There are cases where the Max, 99th percentile latency is
> non-monotonic but I've also seen a good amount of run to run variation
> there with a single bad sample polluting the results. In such cases,
> the averages are more representative.
>
> >
> > [1] https://source.android.com/docs/core/debug/eval_perf#touchlatency
> >
> > [..snip..]
> >
>
> Apart from couple of anomalies, latency nice reduces wait time, especially
> when the system is heavily loaded. If there is any data, or any specific
> workload you would like me to run on the test system, please do let me know.
> Meanwhile, I'll try to get some numbers for larger workloads like SpecJBB
> that did see improvements with latency nice on v5.

Thanks for your tests

Vincent

> --
> Thanks and Regards,
> Prateek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ