[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2eafde3-b9e8-afc9-8934-ca8e597c33e6@amd.com>
Date: Tue, 14 May 2024 20:53:16 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, vschneid@...hat.com, linux-kernel@...r.kernel.org,
wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de,
tim.c.chen@...el.com, yu.c.chen.y@...il.com
Subject: Re: [RFC][PATCH 10/10] sched/eevdf: Use sched_attr::sched_runtime to
set request/slice suggestion
Hello Chenyu,
On 5/14/2024 2:48 PM, Chen Yu wrote:
>>> [..snip..]
>>> /*
>>> * Scan the LLC domain for idle CPUs; this is dynamically regulated by
>>> * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>>> @@ -7384,10 +7402,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>> if (sched_feat(SIS_UTIL)) {
>>> sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
>>> if (sd_share) {
>>> - /* because !--nr is the condition to stop scan */> - nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
>>> + nr = adjust_idle_scan(p, READ_ONCE(sd_share->nr_idle_scan));
>>> /* overloaded LLC is unlikely to have idle cpu/core */
>>> - if (nr == 1)
>>> + if (nr <= 0)
>>
>> I was wondering if this would preserve the current behavior with
>> SIS_FAST toggled off? Since the implementation below still does a
>> "--nr <= 0" , wouldn't it effectively visit one CPU less overall now?
>>
>> Have you tried something similar to the below hunk?
>>
>> /* because !--nr is the condition to stop scan */
>> nr = adjust_idle_scan(p, READ_ONCE(sd_share->nr_idle_scan)) + 1;
>> if (nr == 1)
>> return -1;
>>
>
> Yeah, right, to keep the scan depth consistent, the "+1" should be kept.
>
>> I agree with Mike that looking at slice to limit scan-depth seems odd.
>> My experience with netperf is that the workload cares more about the
>> server-client being co-located on the closest cache domain and by
>> limiting scan-depth using slice, this is indirectly achieved since all
>> the wakeups carry the WF_SYNc flag.
>>
>
> Exactly. This is the original motivation.
>
>> P.S. have you tried using the slice in __select_idle_cpu()? Similar to
>> sched_idle_cpu() check, perhaps an additional sched_preempt_short_cpu()
>> which compares rq->curr->se.slice with the waking task's slice and
>> returs that cpu if SIS_SHORT can help run the workload quicker?
>
> This is a good idea, it seems to be benefit PREEMPT_SHORT. If the customized
> task slice is introduced, we can leverage this hint for latency related
> optimization. Task wakeup is one thing, I can also think of other aspects,
> like idle load balance, etc. I'm not sure what is the proper usage of the
> task slice though, this is why I sent this RFC.
>
>> Note:
>> This will not work if the SIS scan itself is the largest overhead in the
>> wakeup cycle and not the task placement itself. Previously during
>> SIS_UTIL testing, to measure the overheads of scan vs placement, we
>> would do a full scan but return the result that SIS_UTIL would have
>> returned to determine the overhead of the search itself.
>>
>
> Regarding the task placement, do you mean the time between a task is enqueued
> and picked up? Do you have any recommendation which workload can expose the
> scan overhead most?
Sorry for not being clear here. From what I've observed in the past,
there are two dimensions to slect_idle_sibling():
i) Placement: Final CPU select_idle_sibling() returns
ii) Search: Do we find an idle core/CPU in select_idle_sibling()
In case of netperf, I've observed that i) is more important than ii)
wherin a placement of client on same core/thread as that of the server
results in better performance vs finding an idle CPU on a remote LLC.
For hackbench/tbench, when runqueues are under high utilization (~75%),
reduction in search time ii) seems to be more beneficial.
There was also a wakeup from IPI / without IPI angle that I never quite
got to the bottom of that Mathieu has highlighted last year. I'll go
get some more data on that front and give your patch a try. Expect
results in a couple of days.
Thank you.
>
> thanks,
> Chenyu
>
>>> return -1;
>>> }
>>> }
>>> [..snip..]
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists