lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230425123205.GB414327@lorien.usersys.redhat.com>
Date:   Tue, 25 Apr 2023 08:32:05 -0400
From:   Phil Auld <pauld@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...nel.org, vincent.guittot@...aro.org,
        linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, corbet@....net,
        qyousef@...alina.io, chris.hyser@...cle.com,
        patrick.bellasi@...bug.net, pjt@...gle.com, pavel@....cz,
        qperret@...gle.com, tim.c.chen@...ux.intel.com, joshdon@...gle.com,
        timj@....org, kprateek.nayak@....com, yu.c.chen@...el.com,
        youssefesmat@...omium.org, joel@...lfernandes.org, efault@....de,
        jhladky@...hat.com
Subject: Re: [PATCH 00/17] sched: EEVDF using latency-nice


Hi Peter,

On Tue, Mar 28, 2023 at 11:26:22AM +0200 Peter Zijlstra wrote:
> Hi!
> 
> Latest version of the EEVDF [1] patches.
> 
> Many changes since last time; most notably it now fully replaces CFS and uses
> lag based placement for migrations. Smaller changes include:
> 
>  - uses scale_load_down() for avg_vruntime; I measured the max delta to be ~44
>    bits on a system/cgroup based kernel build.
>  - fixed a bunch of reweight / cgroup placement issues
>  - adaptive placement strategy for smaller slices
>  - rename se->lag to se->vlag
> 
> There's a bunch of RFC patches at the end and one DEBUG patch. Of those, the
> PLACE_BONUS patch is a mixed bag of pain. A number of benchmarks regress
> because EEVDF is actually fair and gives a 100% parent vs a 50% child a 67%/33%
> split (stress-futex, stress-nanosleep, starve, etc..) instead of a 50%/50%
> split that sleeper bonus achieves. Mostly I think these benchmarks are somewhat
> artificial/daft but who knows.
> 
> The PLACE_BONUS thing horribly messes up things like hackbench and latency-nice
> because it places things too far to the left in the tree. Basically it messes
> with the whole 'when', by placing a task back in history you're putting a
> burden on the now to accomodate catching up. More tinkering required.
> 
> But over-all the thing seems to be fairly usable and could do with more
> extensive testing.

I had Jirka run his suite of perf workloads on this. These are macro benchmarks
on baremetal (NAS, SPECjbb etc). I can't share specific results because it
comes out in nice html reports on an internal website. There was no noticeable
performance change, which is a good thing. Overall performance was comparable
to CFS.

There was a win in stability though. A number of the error boxes across the
board were smaller. So less variance.

These are mostly performance/throughput tests. We're going to run some more
latency sensitive tests now.

So, fwiw, EEVDF is performing well on macro workloads here.



Cheers,
Phil

> 
> [1] https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564
> 
> Results:
> 
>   hackbech -g $nr_cpu + cyclictest --policy other results:
> 
> 			EEVDF			 CFS
> 
> 		# Min Latencies: 00054
>   LNICE(19)	# Avg Latencies: 00660
> 		# Max Latencies: 23103
> 
> 		# Min Latencies: 00052		00053
>   LNICE(0)	# Avg Latencies: 00318		00687
> 		# Max Latencies: 08593		13913
> 
> 		# Min Latencies: 00054
>   LNICE(-19)	# Avg Latencies: 00055
> 		# Max Latencies: 00061
> 
> 
> Some preliminary results from Chen Yu on a slightly older version:
> 
>   schbench  (95% tail latency, lower is better)
>   =================================================================================
>   case                    nr_instance            baseline (std%)    compare% ( std%)
>   normal                   25%                     1.00  (2.49%)    -81.2%   (4.27%)
>   normal                   50%                     1.00  (2.47%)    -84.5%   (0.47%)
>   normal                   75%                     1.00  (2.5%)     -81.3%   (1.27%)
>   normal                  100%                     1.00  (3.14%)    -79.2%   (0.72%)
>   normal                  125%                     1.00  (3.07%)    -77.5%   (0.85%)
>   normal                  150%                     1.00  (3.35%)    -76.4%   (0.10%)
>   normal                  175%                     1.00  (3.06%)    -76.2%   (0.56%)
>   normal                  200%                     1.00  (3.11%)    -76.3%   (0.39%)
>   ==================================================================================
> 
>   hackbench (throughput, higher is better)
>   ==============================================================================
>   case                    nr_instance            baseline(std%)  compare%( std%)
>   threads-pipe              25%                      1.00 (<2%)    -17.5 (<2%)
>   threads-socket            25%                      1.00 (<2%)    -1.9 (<2%)
>   threads-pipe              50%                      1.00 (<2%)     +6.7 (<2%)
>   threads-socket            50%                      1.00 (<2%)    -6.3  (<2%)
>   threads-pipe              100%                     1.00 (3%)     +110.1 (3%)
>   threads-socket            100%                     1.00 (<2%)    -40.2 (<2%)
>   threads-pipe              150%                     1.00 (<2%)    +125.4 (<2%)
>   threads-socket            150%                     1.00 (<2%)    -24.7 (<2%)
>   threads-pipe              200%                     1.00 (<2%)    -89.5 (<2%)
>   threads-socket            200%                     1.00 (<2%)    -27.4 (<2%)
>   process-pipe              25%                      1.00 (<2%)    -15.0 (<2%)
>   process-socket            25%                      1.00 (<2%)    -3.9 (<2%)
>   process-pipe              50%                      1.00 (<2%)    -0.4  (<2%)
>   process-socket            50%                      1.00 (<2%)    -5.3  (<2%)
>   process-pipe              100%                     1.00 (<2%)    +62.0 (<2%)
>   process-socket            100%                     1.00 (<2%)    -39.5  (<2%)
>   process-pipe              150%                     1.00 (<2%)    +70.0 (<2%)
>   process-socket            150%                     1.00 (<2%)    -20.3 (<2%)
>   process-pipe              200%                     1.00 (<2%)    +79.2 (<2%)
>   process-socket            200%                     1.00 (<2%)    -22.4  (<2%)
>   ==============================================================================
> 
>   stress-ng (throughput, higher is better)
>   ==============================================================================
>   case                    nr_instance            baseline(std%)  compare%( std%)
>   switch                  25%                      1.00 (<2%)    -6.5 (<2%)
>   switch                  50%                      1.00 (<2%)    -9.2 (<2%)
>   switch                  75%                      1.00 (<2%)    -1.2 (<2%)
>   switch                  100%                     1.00 (<2%)    +11.1 (<2%)
>   switch                  125%                     1.00 (<2%)    -16.7% (9%)
>   switch                  150%                     1.00 (<2%)    -13.6 (<2%)
>   switch                  175%                     1.00 (<2%)    -16.2 (<2%)
>   switch                  200%                     1.00 (<2%)    -19.4% (<2%)
>   fork                    50%                      1.00 (<2%)    -0.1 (<2%)
>   fork                    75%                      1.00 (<2%)    -0.3 (<2%)
>   fork                    100%                     1.00 (<2%)    -0.1 (<2%)
>   fork                    125%                     1.00 (<2%)    -6.9 (<2%)
>   fork                    150%                     1.00 (<2%)    -8.8 (<2%)
>   fork                    200%                     1.00 (<2%)    -3.3 (<2%)
>   futex                   25%                      1.00 (<2%)    -3.2 (<2%)
>   futex                   50%                      1.00 (3%)     -19.9 (5%)
>   futex                   75%                      1.00 (6%)     -19.1 (2%)
>   futex                   100%                     1.00 (16%)    -30.5 (10%)
>   futex                   125%                     1.00 (25%)    -39.3 (11%)
>   futex                   150%                     1.00 (20%)    -27.2% (17%)
>   futex                   175%                     1.00 (<2%)    -18.6 (<2%)
>   futex                   200%                     1.00 (<2%)    -47.5 (<2%)
>   nanosleep               25%                      1.00 (<2%)    -0.1 (<2%)
>   nanosleep               50%                      1.00 (<2%)    -0.0% (<2%)
>   nanosleep               75%                      1.00 (<2%)    +15.2% (<2%)
>   nanosleep               100%                     1.00 (<2%)    -26.4 (<2%)
>   nanosleep               125%                     1.00 (<2%)    -1.3 (<2%)
>   nanosleep               150%                     1.00 (<2%)    +2.1  (<2%)
>   nanosleep               175%                     1.00 (<2%)    +8.3 (<2%)
>   nanosleep               200%                     1.00 (<2%)    +2.0% (<2%)
>   ===============================================================================
> 
>   unixbench (throughput, higher is better)
>   ==============================================================================
>   case                    nr_instance            baseline(std%)  compare%( std%)
>   spawn                   125%                      1.00 (<2%)    +8.1 (<2%)
>   context1                100%                      1.00 (6%)     +17.4 (6%)
>   context1                75%                       1.00 (13%)    +18.8 (8%)
>   =================================================================================
> 
>   netperf  (throughput, higher is better)
>   ===========================================================================
>   case                    nr_instance          baseline(std%)  compare%( std%)
>   UDP_RR                  25%                   1.00    (<2%)    -1.5%  (<2%)
>   UDP_RR                  50%                   1.00    (<2%)    -0.3%  (<2%)
>   UDP_RR                  75%                   1.00    (<2%)    +12.5% (<2%)
>   UDP_RR                 100%                   1.00    (<2%)    -4.3%  (<2%)
>   UDP_RR                 125%                   1.00    (<2%)    -4.9%  (<2%)
>   UDP_RR                 150%                   1.00    (<2%)    -4.7%  (<2%)
>   UDP_RR                 175%                   1.00    (<2%)    -6.1%  (<2%)
>   UDP_RR                 200%                   1.00    (<2%)    -6.6%  (<2%)
>   TCP_RR                  25%                   1.00    (<2%)    -1.4%  (<2%)
>   TCP_RR                  50%                   1.00    (<2%)    -0.2%  (<2%)
>   TCP_RR                  75%                   1.00    (<2%)    -3.9%  (<2%)
>   TCP_RR                 100%                   1.00    (2%)     +3.6%  (5%)
>   TCP_RR                 125%                   1.00    (<2%)    -4.2%  (<2%)
>   TCP_RR                 150%                   1.00    (<2%)    -6.0%  (<2%)
>   TCP_RR                 175%                   1.00    (<2%)    -7.4%  (<2%)
>   TCP_RR                 200%                   1.00    (<2%)    -8.4%  (<2%)
>   ==========================================================================
> 
> 
> ---
> Also available at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/eevdf
> 
> ---
> Parth Shah (1):
>       sched: Introduce latency-nice as a per-task attribute
> 
> Peter Zijlstra (14):
>       sched/fair: Add avg_vruntime
>       sched/fair: Remove START_DEBIT
>       sched/fair: Add lag based placement
>       rbtree: Add rb_add_augmented_cached() helper
>       sched/fair: Implement an EEVDF like policy
>       sched: Commit to lag based placement
>       sched/smp: Use lag to simplify cross-runqueue placement
>       sched: Commit to EEVDF
>       sched/debug: Rename min_granularity to base_slice
>       sched: Merge latency_offset into slice
>       sched/eevdf: Better handle mixed slice length
>       sched/eevdf: Sleeper bonus
>       sched/eevdf: Minimal vavg option
>       sched/eevdf: Debug / validation crud
> 
> Vincent Guittot (2):
>       sched/fair: Add latency_offset
>       sched/fair: Add sched group latency support
> 
>  Documentation/admin-guide/cgroup-v2.rst |   10 +
>  include/linux/rbtree_augmented.h        |   26 +
>  include/linux/sched.h                   |    6 +
>  include/uapi/linux/sched.h              |    4 +-
>  include/uapi/linux/sched/types.h        |   19 +
>  init/init_task.c                        |    3 +-
>  kernel/sched/core.c                     |   65 +-
>  kernel/sched/debug.c                    |   49 +-
>  kernel/sched/fair.c                     | 1199 ++++++++++++++++---------------
>  kernel/sched/features.h                 |   29 +-
>  kernel/sched/sched.h                    |   23 +-
>  tools/include/uapi/linux/sched.h        |    4 +-
>  12 files changed, 794 insertions(+), 643 deletions(-)
> 

-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ