linux-kernel - Re: [PATCH v5 0/7] Add latency priority for CFS class

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKfTPtCFxM8a+9XMCRMdBE0QLr2trHUN+im58Kz+74g9bQH0Lw@mail.gmail.com>
Date:   Thu, 27 Oct 2022 18:34:47 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     K Prateek Nayak <kprateek.nayak@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, parth@...ux.ibm.com,
        qais.yousef@....com, chris.hyser@...cle.com,
        valentin.schneider@....com, patrick.bellasi@...bug.net,
        David.Laight@...lab.com, pjt@...gle.com, pavel@....cz,
        tj@...nel.org, qperret@...gle.com, tim.c.chen@...ux.intel.com,
        joshdon@...gle.com, timj@....org,
        Gautham Shenoy <gautham.shenoy@....com>
Subject: Re: [PATCH v5 0/7] Add latency priority for CFS class

Hi Prateek,

On Tue, 25 Oct 2022 at 08:36, K Prateek Nayak <kprateek.nayak@....com> wrote:
>
> Hello Vincent,
>
> I've rerun some tests with a different configuration with more
> contention for CPU and I can see a linear behavior. Sharing the
> results below.
>
> On 10/13/2022 8:54 PM, Vincent Guittot wrote:
> >
> > [..snip..]
> >>
> >> o Hackbench and Cyclictest in NPS1 configuration
> >>
> >> perf bench sched messaging -p -t -l 100000 -g 16&
> >> cyclictest --policy other -D 5 -q -n -H 20000
> >>
> >> -----------------------------------------------------------------------------------------------------------------
> >> |Hackbench     |      Cyclictest LN = 19        |         Cyclictest LN = 0       |      Cyclictest LN = -20    |
> >> |LN            |--------------------------------|---------------------------------|-----------------------------|
> >> |v             |   Min  |   Avg   |  Max        |     Min  |   Avg   |  Max       |     Min  |   Avg   |  Max   |
> >> |--------------|--------|---------|-------------|----------|---------|------------|----------|---------|--------|
> >> |0             |  54.00 |  117.00 | 3021.67     |    53.67 |  65.33  | 133.00     |    53.67 |  65.00  | 201.33 |  ^
> >> |19            |  50.00 |  100.67 | 3099.33     |    41.00 |  64.33  | 1014.33    |    54.00 |  63.67  | 213.33 |
> >> |-20           |  53.00 |  169.00 | 11661.67    |    53.67 |  217.33 | 14313.67   |    46.00 |  61.33  | 236.00 |  ^
> >> -----------------------------------------------------------------------------------------------------------------
> >
> > The latency results look good with Cyclictest LN:0 and hackbench LN:0.
> > 133us max latency. This suggests that your system is not overloaded
> > and cyclictest doesn't really compete with others to run.
>
> Following is the result of running cyclictest alongside hackbench with 32 groups:
>
> perf bench sched messaging -p -l 100000 -g 32&
> cyclictest --policy other -D 5 -q -n -H 20000
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench   |      Cyclictest LN = 19      |      Cyclictest LN = 0        |    Cyclictest LN = -20    |
> | LN          |------------------------------|-------------------------------|---------------------------|
> |             |   Min  |   Avg   |  Max      |   Min  |   Avg   |   Max      |   Min  |  Avg  |   Max    |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 0           |  54.00 |  165.00 | 6899.00   |  22.00 |  85.00  |  3294.00   |  23.00 | 64.00 |  276.00  |
> | 19          |  53.00 |  173.00 | 3275.00   |  40.00 |  60.00  |  2276.00   |  13.00 | 59.00 |  94.00   |
> | -20         |  52.00 |  293.00 | 19980.00  |  52.00 |  280.00 |  14305.00  |  53.00 | 95.00 |  5713.00 |
> ----------------------------------------------------------------------------------------------------------
>
> I see a spike for Max in (0, 0) configuration and the latency decreases
> monotonically with lower latency nice value.

Your results looks good

>
> >
> >>
> >> o Hackbench and schbench in NPS1 configuration
> >>
> >> perf bench sched messaging -p -t -l 1000000 -g 16&
> >> schebcnh -m 1 -t 64 -s 30s
> >>
> >> ------------------------------------------------------------------------------------------------------------
> >> |Hackbench     |   schbench LN = 19         |        schbench LN = 0         |       schbench LN = -20     |
> >> |LN            |----------------------------|--------------------------------|-----------------------------|
> >> |v             |  90th  |  95th  |  99th    |   90th  |  95th   |  99th      |   90th  |   95th   | 99th   |
> >> |--------------|--------|--------|----------|---------|---------|------------|---------|----------|--------|
> >> |0             |  4264  |  6744  |  15664   |   17952 |  32672  |  55488     |   15088 |   25312  | 50112  |
> >> |19            |  288   |  613   |  2332    |   274   |  1015   |  3628      |   374   |   1394   | 4424   |
> >> |-20           |  35904 |  47680 |  79744   |   87168 |  113536 |  176896    |   13008 |   21216  | 42560  |   ^
> >> ------------------------------------------------------------------------------------------------------------
> >
> > For the schbench, your test is 30 seconds long which is longer than
> > the duration of perf bench sched messaging -p -t -l 1000000 -g 16&
> >
> > The duration of the latter varies depending of latency nice value so
> > schbench is disturb more time in some cases
>
> I've rerun this with hackbench running 128 groups alongside schbench
> with 2 messenger and 1 worker each. With larger worker count, I still
> see non-monotonic behavior in 99th percentile latency of schbench.
> I also see number of latency samples collected by schbench to vary
> over the 30 second run for different latency nice values which could
> also pay a part in seeing the unexpected behavior. For lower worker
> count, I see the number of samples collected is similar. Following
> is the configuration and the latency reported by schbench:
>
> perf bench sched messaging -p -t -l 150000 -g 128&
> schbench -m 2 -t 1 -s 30s
>
> Note: In all cases, hackbench runs longer than schbench.
>
> -------------------------------------------------------------------------------------------------
> | Hackbench |     schbench LN = 19       |      schbench LN = 0      |     schbench LN = -20    |
> | LN        |----------------------------|---------------------------|--------------------------|
> |           |  90th  |  95th  |  99th    |  90th  |  95th  |  99th   |  90th  |  95th  |  99th  |
> |-----------|--------|--------|----------|--------|--------|---------|--------|--------|--------|
> | 0         |  42    |  92    |  2972    |  26    |  49    |  2356   |  9     |  11    |  20    |
> | 19        |  35    |  424   |  4984    |  13    |  390   |  5096   |  8     |  10    |  14    | ^
> | -19       |  144   |  3516  |  110208  |  61    |  807   |  34880  |  25    |  39    |  295   |
> -------------------------------------------------------------------------------------------------
>
> I see 90th and 95th percentile latency decrease monotonically with
> latency nice value of schbench (for a fixed latency nice value of
> hackbench) but there are cases where 99th percentile latency
> reported by schbench may not strictly decrease with lower latency
> nice value (Marked with ^)
>
> Note: Only a small number of bad samples can affect the 99th
> percentile latency for the above configuration. The monotonic
> behavior in 90th and 95th percentile latency is a good data point
> to show latency nice is indeed working as expected.

Yes, I think you are right that the 99th percentile is not stable
enough because it can be impacted by a small number of bad samples

>
> If there is any specific workload you would like me to run on the
> test system, or any additional data you would like for above
> workloads, please do let me know.

Thanks a lot for your tests.
I'm about to send v6

>
> --
> Thanks and Regards,
> Prateek