[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <xhsmh8rgyre8i.mognet@vschneid.remote.csb>
Date: Wed, 15 Feb 2023 18:10:53 +0000
From: Valentin Schneider <vschneid@...hat.com>
To: Sun Shouxin <sunshouxin@...natelecom.cn>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com
Cc: linux-kernel@...r.kernel.org, huyd12@...natelecom.cn,
sunshouxin@...natelecom.cn
Subject: Re: [RESEND PATCH] sched: sd_llc_id initialized
On 14/02/23 17:54, Sun Shouxin wrote:
> In my test,I use isolcpus to isolate cpu for specific,
> and then I noticed different scenario when core binding.
>
> For example, the NUMA topology is as follows,
> NUMA node0 CPU(s): 0-15,32-47
> NUMA node1 CPU(s): 16-31,48-63
>
> and the 'isolcpus' is as follows,
> isolcpus=14,15,30,31,46,47,62,63
>
> One task initially running on the non-isolated core belong to NUMA0
> was bind to one isolated core on NUMA1, and then change its cpu affinity
> to all cores, I notice the task can be scheduled back to the
> non-isolated core on NUMA0.
>
> 1.taskset -pc 0-13 3512 (task running on core 1)
> 2.taskset -pc 63 3512 (task running on isolated core 63)
> 3.taskset -pc 0-63 3512 (task running on core 1)
>
This is working as intended, no?
> Another case, one task initially running on the non-isolated core
> belong to NUMA1 was bind to one isolated core on NUMA1,
> and then change its cpu affinity to all cores,
> the task can not be scheduled out and always run on the isolated core.
>
> 1.taskset -pc 16-29 3512 (task running on core 17)
> 2.taskset -pc 63 3512 (task running on isolated core 63)
> 3.taskset -pc 0-63 3512 (task still running on core 63
> and not schedule out)
>
And this is also not wrong, since CPU63 is in the task's affinity mask.
That said, I can see that in this case we'd want the task to use other CPUs
if it makes sense wrt load balance.
However, since CPU63 is attached to a NULL sched_domain, AFAIA your
solution is at the mercy of the @prev and @target CPUs passed to
select_idle_sibling(). So this might only work if the waker is on a
non-isolated CPU.
I don't think your patch is wrong, but I don't think it entirely fixes the
issue either. Unfortunately, due to isolated CPUs being attached to NULL
sched_domains, there isn't a magic solution as the majority of scheduler
decisions are based on these.
A safe bet would be to exclude isolated CPUs from the affinity of your
non-critical tasks. Things like TuneD [1] and/or cpusets could help.
[1]: https://github.com/redhat-performance/tuned
Powered by blists - more mailing lists