[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fad35024-32c6-21bb-17b2-9fd7e6c781f3@amd.com>
Date: Tue, 22 Feb 2022 15:09:27 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: peterz@...radead.org, aubrey.li@...ux.intel.com, efault@....de,
gautham.shenoy@....com, linux-kernel@...r.kernel.org,
mingo@...nel.org, song.bao.hua@...ilicon.com,
srikar@...ux.vnet.ibm.com, valentin.schneider@....com,
vincent.guittot@...aro.org
Subject: Re: [PATCH v4] sched/fair: Consider cpu affinity when allowing NUMA
imbalance in find_idlest_group
Hello Mel,
On 2/22/2022 2:15 PM, Mel Gorman wrote:
> [..snip..]
>> Following are the results from testing:
>>
>> - Un-affined runs:
>> Command: stress-ng -t 30s --exec <Worker>
>>
>> Kernel versions:
>> - balance-wake - This patch
>> - branch - This patch + Mel's suggested branch
>> - branch-unlikely - This patch + Mel's suggested branch + unlikely
>>
>> Result format: Amean in ns [Co-eff of Var] (% Improvement)
>>
>> Workers balance-wake branch branch-unlikely
>> 1 18613.20 [0.01] (0.00 pct) 18348.00 [0.04] (1.42 pct) 18299.20 [0.02] (1.69 pct)
>> 2 18634.40 [0.03] (0.00 pct) 18163.80 [0.04] (2.53 pct) 19037.80 [0.05] (-2.16 pct)
>> 4 20997.40 [0.02] (0.00 pct) 20980.80 [0.02] (0.08 pct) 21527.40 [0.02] (-2.52 pct)
>> 8 20890.20 [0.01] (0.00 pct) 19714.60 [0.07] (5.63 pct) 20021.40 [0.05] (4.16 pct)
>> 16 21200.20 [0.02] (0.00 pct) 20564.40 [0.00] (3.00 pct) 20676.00 [0.01] (2.47 pct)
>> 32 21301.80 [0.02] (0.00 pct) 20767.40 [0.02] (2.51 pct) 20945.00 [0.01] (1.67 pct)
>> 64 22772.40 [0.01] (0.00 pct) 22505.00 [0.01] (1.17 pct) 22629.40 [0.00] (0.63 pct)
>> 128 25843.00 [0.01] (0.00 pct) 25124.80 [0.00] (2.78 pct) 25377.40 [0.00] (1.80 pct)
>> 256 18691.00 [0.02] (0.00 pct) 19086.40 [0.05] (-2.12 pct) 18013.00 [0.04] (3.63 pct)
>> 512 19658.40 [0.03] (0.00 pct) 19568.80 [0.01] (0.46 pct) 18972.00 [0.02] (3.49 pct)
>> 1024 19126.80 [0.04] (0.00 pct) 18762.80 [0.02] (1.90 pct) 18878.20 [0.04] (1.30 pct)
>>
> Co-eff of variance looks low but for the lower counts before the machine
> is saturated (>=256?) it does not look like it helps and if anything,
> it hurts. A branch mispredict profile might reveal more but I doubt
> it's worth the effort at this point.
The positive percentage here represents improvement i.e., the time
between the events sched_process_fork and sched_wakeup_new has come
down in most cases after adding the branch.
Same is applicable for results below.
>> - Affined runs:
>> Command: taskset -c 0-254 stress-ng -t 30s --exec <Worker>
>>
>> Kernel versions:
>> - balance-wake-affine - This patch + affined run
>> - branch-affine - This patch + Mel's suggested branch + affined run
>> - branch-unlikely-affine - This patch + Mel's suggested branch + unlikely + affined run
>>
>> Result format: Amean in ns [Co-eff of Var] (% Improvement)
>>
>> Workers balance-wake-affine branch-affine branch-unlikely-affine
>> 1 18515.00 [0.01] (0.00 pct) 18538.00 [0.02] (-0.12 pct) 18568.40 [0.01] (-0.29 pct)
>> 2 17882.80 [0.01] (0.00 pct) 19627.80 [0.09] (-9.76 pct) 18790.40 [0.01] (-5.08 pct)
>> 4 21204.20 [0.01] (0.00 pct) 21410.60 [0.04] (-0.97 pct) 21715.20 [0.03] (-2.41 pct)
>> 8 20840.20 [0.01] (0.00 pct) 19684.60 [0.07] (5.55 pct) 21074.20 [0.02] (-1.12 pct)
>> 16 21115.20 [0.02] (0.00 pct) 20823.00 [0.01] (1.38 pct) 20719.80 [0.00] (1.87 pct)
>> 32 21159.00 [0.02] (0.00 pct) 21371.20 [0.01] (-1.00 pct) 21253.20 [0.01] (-0.45 pct)
>> 64 22768.20 [0.01] (0.00 pct) 22816.80 [0.00] (-0.21 pct) 22662.00 [0.00] (0.47 pct)
>> 128 25671.80 [0.00] (0.00 pct) 25528.20 [0.00] (0.56 pct) 25404.00 [0.00] (1.04 pct)
>> 256 27209.00 [0.01] (0.00 pct) 26751.00 [0.01] (1.68 pct) 26733.20 [0.00] (1.75 pct)
>> 512 20241.00 [0.03] (0.00 pct) 19378.60 [0.03] (4.26 pct) 19671.40 [0.00] (2.81 pct)
>> 1024 19380.80 [0.05] (0.00 pct) 18940.40 [0.02] (2.27 pct) 19071.80 [0.00] (1.59 pct)
> Same here, the cpumask check obviously hurts but it does not look like
> the unlikely helps.
I agree. unlikely doesn't show consistent results.
>> With or without the unlikely, adding the check before doing the
>> cpumask operation benefits most cases of un-affined tasks.
>>
> I think repost the patch with the num_online_cpus check added in. Yes,
> it hurts a bit for the pure fork case when the cpus_ptr is contrained by
> a scheduler policy but at least it makes sense.
I'll post the V5 soon with the check as you suggested.
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists