[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21a18b1c-b5ae-410c-8d1f-3b63358b0e61@efficios.com>
Date: Mon, 2 Dec 2024 10:01:57 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Gabriele Monaco <gmonaco@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] sched: Optimise task_mm_cid_work duration
On 2024-12-02 09:56, Gabriele Monaco wrote:
> Hi Mathieu,
>
> thanks for the quick reply.
>
>> Thanks for looking into this. I understand that you are after
>> minimizing the
>> latency introduced by task_mm_cid_work on isolated cores. I think
>> we'll need
>> to think a bit harder, because the proposed solution does not work:
>>
>> * for_each_cpu_from - iterate over CPUs present in @mask, from @cpu
>> to the end of @mask.
>>
>> cpu is uninitialized. So this is completely broken.
>
> My bad, wrong macro.. Should be for_each_cpu
>
>> Was this tested
>> against a workload that actually uses concurrency IDs to ensure it
>> does
>> not break the whole thing ? Did you run the rseq selftests ?
>>
>
> I did run the stress-ng --rseq command for a while and didn't see any
> error reported, but it's probably not bulletproof. I'll use the
> selftests for the next iterations.
>
>> Also, the mm_cidmask is a mask of concurrency IDs, not a mask of
>> CPUs. So
>> using it to iterate on CPUs is wrong.
>>
>
> Mmh I get it, during my tests I was definitely getting better results
> than using the mm_cpus_allowed mask, but I guess that was a broken test
> so it just doesn't count..
> Do you think using mm_cpus_allowed would make more sense, with the
> /risk/ of being a bit over-cautious?
mm_cpus_allowed can be updated dynamically by setting cpu affinity
and changing the cpusets. If we change the iteration from each possible
cpus to allowed cpus, then we need to adapt the allowed cpus updates
with the associated updates to the mm_cid as well. This is adding
complexity.
I understand that you wish to offload this task_work to a non-isolated
CPU (non-RT). If you do so, do you really care about the duration of
task_mm_cid_work enough to justify the added complexity to the
cpu affinity/cpusets updates ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists