linux-kernel - Re: [PATCH V3 1/2] sched: Reduce the default slice to avoid tasks getting an extra tick

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250222030221.63120-1-15645113830zzh@gmail.com>
Date: Sat, 22 Feb 2025 11:02:22 +0800
From: zihan zhou <15645113830zzh@...il.com>
To: kprateek.nayak@....com
Cc: 15645113830zzh@...il.com,
	bsegall@...gle.com,
	dietmar.eggemann@....com,
	gautham.shenoy@....com,
	juri.lelli@...hat.com,
	linux-kernel@...r.kernel.org,
	mgorman@...e.de,
	mingo@...hat.com,
	peterz@...radead.org,
	rostedt@...dmis.org,
	vincent.guittot@...aro.org,
	vschneid@...hat.com
Subject: Re: [PATCH V3 1/2] sched: Reduce the default slice to avoid tasks getting an extra tick

Thank you for your reply, thank you for providing such a detailed test,
which also let me learn a lot.

> Hello Zhou,
> 
> I'll leave some testing data below but overall, in my testing with
> CONFIG_HZ=250 and CONFIG_HZ=10000, I cannot see any major regressions
> (at least not for any stable data point) There are few small regressions
> probably as a result of grater opportunity for wakeup preemption since
> RUN_TO_PARITY will work for a slightly shorter duration now but I
> haven't dug deeper to confirm if they are run to run variation or a
> result of the larger number of wakeup preemption.
> 
> Since most servers run with CONFIG_HZ=250, and the tick is anyways 4ms
> and with default base slice currently at 3ms, I don't think there will
> be any discernible difference in most workloads (fingers crossed)
> 
> Please find full data below.


This should be CONFIG_HZ=250 and CONFIG_HZ=1000, is it wrong?

It seems that no performance difference is good news. This change will not
affect performance. This problem was first found in the openeuler 6.6
kernel. If one task runs all the time and the other runs for 3ms and then
sleeps for 1us, the running time of the two tasks will become 4:3, but 1:1
on orig cfs. This problem has disappeared in the mainline kernel.

> o Benchmark results (CONFIG_HZ=1000)
> 
> ==================================================================
> Test          : hackbench
> Units         : Normalized time in seconds
> Interpretation: Lower is better
> Statistic     : AMean
> ==================================================================
> Case:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>   1-groups     1.00 [ -0.00]( 8.66)     1.05 [ -5.30](16.73)
>   2-groups     1.00 [ -0.00]( 5.02)     1.07 [ -6.54]( 7.29)
>   4-groups     1.00 [ -0.00]( 1.27)     1.02 [ -1.67]( 3.74)
>   8-groups     1.00 [ -0.00]( 2.75)     0.99 [  0.78]( 2.61)
> 16-groups     1.00 [ -0.00]( 2.02)     0.97 [  2.97]( 1.19)
> 
> 
> ==================================================================
> Test          : tbench
> Units         : Normalized throughput
> Interpretation: Higher is better
> Statistic     : AMean
> ==================================================================
> Clients:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>      1     1.00 [  0.00]( 0.40)     1.00 [ -0.44]( 0.47)
>      2     1.00 [  0.00]( 0.49)     0.99 [ -0.65]( 1.39)
>      4     1.00 [  0.00]( 0.94)     1.00 [ -0.34]( 0.09)
>      8     1.00 [  0.00]( 0.64)     0.99 [ -0.77]( 1.57)
>     16     1.00 [  0.00]( 1.04)     0.98 [ -2.00]( 0.98)
>     32     1.00 [  0.00]( 1.13)     1.00 [  0.34]( 1.31)
>     64     1.00 [  0.00]( 0.58)     1.00 [ -0.28]( 0.80)
>    128     1.00 [  0.00]( 1.40)     0.99 [ -0.91]( 0.51)
>    256     1.00 [  0.00]( 1.14)     0.99 [ -1.48]( 1.17)
>    512     1.00 [  0.00]( 0.51)     1.00 [ -0.25]( 0.66)
>   1024     1.00 [  0.00]( 0.62)     0.99 [ -0.79]( 0.40)
> 
> 
> ==================================================================
> Test          : stream-10
> Units         : Normalized Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic     : HMean
> ==================================================================
> Test:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>   Copy     1.00 [  0.00](16.03)     0.98 [ -2.33](17.69)
> Scale     1.00 [  0.00]( 6.26)     0.99 [ -0.60]( 7.94)
>    Add     1.00 [  0.00]( 8.35)     1.01 [  0.50](11.49)
> Triad     1.00 [  0.00]( 9.56)     1.01 [  0.66]( 9.25)
> 
> 
> ==================================================================
> Test          : stream-100
> Units         : Normalized Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic     : HMean
> ==================================================================
> Test:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>   Copy     1.00 [  0.00]( 6.03)     1.02 [  1.58]( 2.27)
> Scale     1.00 [  0.00]( 5.78)     1.02 [  1.64]( 4.50)
>    Add     1.00 [  0.00]( 5.25)     1.01 [  1.37]( 4.17)
> Triad     1.00 [  0.00]( 5.25)     1.03 [  3.35]( 1.18)
> 
> 
> ==================================================================
> Test          : netperf
> Units         : Normalized Througput
> Interpretation: Higher is better
> Statistic     : AMean
> ==================================================================
> Clients:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>   1-clients     1.00 [  0.00]( 0.06)     1.01 [  0.66]( 0.75)
>   2-clients     1.00 [  0.00]( 0.80)     1.01 [  0.79]( 0.31)
>   4-clients     1.00 [  0.00]( 0.65)     1.01 [  0.56]( 0.73)
>   8-clients     1.00 [  0.00]( 0.82)     1.01 [  0.70]( 0.59)
> 16-clients     1.00 [  0.00]( 0.68)     1.01 [  0.63]( 0.77)
> 32-clients     1.00 [  0.00]( 0.95)     1.01 [  0.87]( 1.06)
> 64-clients     1.00 [  0.00]( 1.55)     1.01 [  0.66]( 1.60)
> 128-clients     1.00 [  0.00]( 1.23)     1.00 [ -0.28]( 1.58)
> 256-clients     1.00 [  0.00]( 4.92)     1.00 [  0.25]( 4.47)
> 512-clients     1.00 [  0.00](57.12)     1.00 [  0.24](62.52)
> 
> 
> ==================================================================
> Test          : schbench
> Units         : Normalized 99th percentile latency in us
> Interpretation: Lower is better
> Statistic     : Median
> ==================================================================
> #workers:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>    1     1.00 [ -0.00](27.55)     0.81 [ 19.35](31.80)
>    2     1.00 [ -0.00](19.98)     0.87 [ 12.82]( 9.17)
>    4     1.00 [ -0.00](10.66)     1.09 [ -9.09]( 6.45)
>    8     1.00 [ -0.00]( 4.06)     0.90 [  9.62]( 6.38)
>   16     1.00 [ -0.00]( 5.33)     0.98 [  1.69]( 1.97)
>   32     1.00 [ -0.00]( 8.92)     0.97 [  3.16]( 1.09)
>   64     1.00 [ -0.00]( 6.06)     0.97 [  3.30]( 2.97)
> 128     1.00 [ -0.00](10.15)     1.05 [ -5.47]( 4.75)
> 256     1.00 [ -0.00](27.12)     1.00 [ -0.20](13.52)
> 512     1.00 [ -0.00]( 2.54)     0.80 [ 19.75]( 0.40)
> 
> 
> ==================================================================
> Test          : new-schbench-requests-per-second
> Units         : Normalized Requests per second
> Interpretation: Higher is better
> Statistic     : Median
> ==================================================================
> #workers:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>    1     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.46)
>    2     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.15)
>    4     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.15)
>    8     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.15)
>   16     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)
>   32     1.00 [  0.00]( 0.43)     1.01 [  0.63]( 0.28)
>   64     1.00 [  0.00]( 1.17)     1.00 [  0.00]( 0.20)
> 128     1.00 [  0.00]( 0.20)     1.00 [  0.00]( 0.20)
> 256     1.00 [  0.00]( 0.27)     1.00 [  0.00]( 1.69)
> 512     1.00 [  0.00]( 0.21)     0.95 [ -4.70]( 0.34)
> 
> 
> ==================================================================
> Test          : new-schbench-wakeup-latency
> Units         : Normalized 99th percentile latency in us
> Interpretation: Lower is better
> Statistic     : Median
> ==================================================================
> #workers:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>    1     1.00 [ -0.00](11.08)     1.33 [-33.33](15.78)
>    2     1.00 [ -0.00]( 4.08)     1.08 [ -7.69](10.00)
>    4     1.00 [ -0.00]( 6.39)     1.21 [-21.43](22.13)
>    8     1.00 [ -0.00]( 6.88)     1.15 [-15.38](11.93)
>   16     1.00 [ -0.00](13.62)     1.08 [ -7.69](10.33)
>   32     1.00 [ -0.00]( 0.00)     1.00 [ -0.00]( 3.87)
>   64     1.00 [ -0.00]( 8.13)     1.00 [ -0.00]( 2.38)
> 128     1.00 [ -0.00]( 5.26)     0.98 [  2.11]( 1.92)
> 256     1.00 [ -0.00]( 1.00)     0.78 [ 22.36](14.65)
> 512     1.00 [ -0.00]( 0.48)     0.73 [ 27.15]( 6.75)
> 
> 
> ==================================================================
> Test          : new-schbench-request-latency
> Units         : Normalized 99th percentile latency in us
> Interpretation: Lower is better
> Statistic     : Median
> ==================================================================
> #workers:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>    1     1.00 [ -0.00]( 1.53)     1.00 [ -0.00]( 1.77)
>    2     1.00 [ -0.00]( 0.50)     1.01 [ -1.35]( 1.19)
>    4     1.00 [ -0.00]( 0.14)     1.00 [ -0.00]( 0.42)
>    8     1.00 [ -0.00]( 0.24)     1.00 [ -0.27]( 1.37)
>   16     1.00 [ -0.00]( 0.00)     1.00 [  0.27]( 0.14)
>   32     1.00 [ -0.00]( 0.66)     1.01 [ -1.48]( 2.65)
>   64     1.00 [ -0.00]( 5.72)     0.96 [  4.32]( 5.64)
> 128     1.00 [ -0.00]( 0.10)     1.00 [ -0.20]( 0.18)
> 256     1.00 [ -0.00]( 2.52)     0.96 [  4.04]( 9.70)
> 512     1.00 [ -0.00]( 0.68)     1.06 [ -5.52]( 0.36)
> 
> 
> ==================================================================
> Test          : longer running benchmarks
> Units         : Normalized throughput
> Interpretation: Higher is better
> Statistic     : Median
> ==================================================================
> Benchmark		pct imp
> ycsb-cassandra          -0.64%
> ycsb-mongodb             0.56%
> deathstarbench-1x        0.30%
> deathstarbench-2x        3.21%
> deathstarbench-3x        2.18%
> deathstarbench-6x       -0.40%
> mysql-hammerdb-64VU     -0.63%
> ---

It seems that new_base_slice has made some progress in high load/latency
and regressed a bit on low load.

It seems that slice should not only be related to the number of cpus, but
also to the corresponding relationship between the overall load and the
number of cpus. The load is relatively heavy, so the slice should be
smaller. The load is relatively light, so the slice should be larger.
Fixing it to a value may not be the optimal solution.

> With that overwhelming amount of data out of the way, please feel free
> to add:
> 
> Tested-by: K Prateek Nayak <kprateek.nayak@....com>

I think you're worth it, but it seems a bit late. I have received the email
of tip-bot2, I am not sure if there can still add it.

Your email made me realize that I should establish a systematic testing
method. Can you give me some useful projects?

Thanks!