lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8cafcdf-a1aa-4ff1-b614-7c6fd4fa9716@linux.ibm.com>
Date: Mon, 15 Apr 2024 16:06:35 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: Ze Gao <zegao2021@...il.com>
Cc: linux-kernel@...r.kernel.org, Ze Gao <zegao@...cent.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
        Mel Gorman <mgorman@...e.de>, Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Valentin Schneider <vschneid@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [PATCH] sched: Improve rq selection for a blocked task when its
 affinity changes

On 15/04/24 11:10, Ze Gao wrote:
> On Sat, Apr 13, 2024 at 12:59 AM Madadi Vineeth Reddy
> <vineethr@...ux.ibm.com> wrote:
>>
>> Hi Ze Gao,
>>
>> On 13/03/24 14:28, Ze Gao wrote:
>>> We observered select_idle_sibling() is likely to return the *target* cpu
>>> early which is likely to be the previous cpu this task is running on even
>>> when it's actually not within the affinity list newly set, from where after
>>> we can only rely on select_fallback_rq() to choose one for us at its will
>>> (the first valid mostly for now).
>>>
>>> However, the one chosen by select_fallback_rq() is highly likely not a
>>> good enough candidate, sometimes it has to rely on load balancer to kick
>>> in to place itself to a better cpu, which adds one or more unnecessary
>>> migrations in no doubt. For example, this is what I get when I move task
>>> 3964 to cpu 23-24 where cpu 23 has a cpu bound work pinned already:
>>>
>>>         swapper       0 [013]   959.791829: sched:sched_migrate_task: comm=stress-ng-cpu pid=3964 prio=120 orig_cpu=13 dest_cpu=23
>>> kworker/24:2-mm    1014 [024]   959.806148: sched:sched_migrate_task: comm=stress-ng-cpu pid=3964 prio=120 orig_cpu=23 dest_cpu=24
>>>
>>
>> I am able to reproduce this scenario of having an extra migration through load balance
>> swapper       0 [031] 398764.057232: sched:sched_migrate_task: comm=loop pid=178687 prio=120 orig_cpu=31 dest_cpu=33
>> ksoftirqd/0  13 [000] 398764.356138: sched:sched_migrate_task: comm=loop pid=178687 prio=120 orig_cpu=33 dest_cpu=34
>>
>> I wrote a simple c program that blocks for few seconds, meanwhile I taskset it to CPUs 33,34 while I already have a
>> busy task running on CPU 33.
>>
>>> The thing is we can actually do better if we do checks early and take more
>>> advantages of the *target* in select_idle_sibling(). That is, we continue
>>> the idle cpu selection if *target* fails the test of cpumask_test_cpu(
>>> *target*, p->cpus_ptr). By doing so, we are likely to pick a good candidate,
>>> especially when the newly allowed cpu set shares some cpu resources with
>>> *target*.
>>>
>>> And with this change, we clearly see the improvement when I move task 3964
>>> to cpu 25-26 where cpu 25 has a cpu bound work pinned already.
>>>
>>>         swapper       0 [027]  4249.204658: sched:sched_migrate_task: comm=stress-ng-cpu pid=3964 prio=120 orig_cpu=27 dest_cpu=26
>>
>> But after applying this patch, The extra migration is still happening as CPU 33 is still chosen by try_to_wake_up.
>>
>> On placing some perf probes and testing,
>>     migration/57     304 [057] 12216.988491:       sched:sched_migrate_task: comm=loop pid=11172 prio=120 orig_cpu=57 dest_cpu=4
>>          swapper       0 [004] 12226.989065: probe:select_idle_sibling_L124: (c0000000001bafc0) i=-1 recent_used_cpu=-1 prev_aff=-1
>>          swapper       0 [004] 12226.989071:       probe:select_fallback_rq: (c0000000001a2e38) cpu=4
>>          swapper       0 [004] 12226.989074:       sched:sched_migrate_task: comm=loop pid=11172 prio=120 orig_cpu=4 dest_cpu=33
>>          swapper       0 [000] 12227.007768:       sched:sched_migrate_task: comm=loop pid=11172 prio=120 orig_cpu=33 dest_cpu=34
>>
>> It is observed that, select_fallback_rq is still taken in this scenario as default target is returned at the end of select_idle_sibling
>> which was CPU 4.
> 
> After second thoughts, it indeed could happen if CPU 4 shares nothing
> with CPU 33(34),
> for example, in different numa nodes.
> 
> IOW, it cannot benefit from select_idle_siblings() and has to rely on
> select_fallback_rq
> as the last resort. Just like what I said in the changelog, this patch
> aims to improve rq
> selection for cases where the newly allowed cpu set shares some cpu
> resources with
> the old cpu set.

Right. In power 10(where I tested), LLC is smaller being at SMT4 small core
level. So, I think there are less chances of affined CPUs to share resources
with the old CPU set.

I also ran schbench and hackbench just to make sure that there is no regression
in powerpc. Luckily there is no regression.

Tested-by: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>

Thanks and Regards
Madadi Vineeth Reddy

> 
> Sorry for not being able to recall all details immediately, since this
> thread has been inactive
> for a long while and receives no feedback from sched folks.
> 
> Best,
> Ze
> 
>> In most of my testing, default target is returned at the end of the function due to the initial checks. It's possible that there would
>> be cases where we can get optimal CPU before we reach end of the select_idle_sibling function but it would be interesting to know if the
>> extra time spent in finding an optimal cpu have an impact instead of returning it earlier if in most of the times we are returning the
>> default target at the end.
>>
>> Thanks and Regards
>> Madadi Vineeth Reddy
>>
>>>
>>> Note we do the same check for *prev* in select_idle_sibling() as well.
>>>
>>> Signed-off-by: Ze Gao <zegao@...cent.com>
>>> ---
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ