lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 28 Mar 2024 11:18:22 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Vitalii Bursov <vitaly@...sov.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/1] sched/fair: allow disabling newidle_balance with
 sched_relax_domain_level



On 3/28/24 6:17 AM, Vitalii Bursov wrote:
> Hi,
> 
> During the upgrade from Linux 5.4 we found a small (around 3%) 
> performance regression which was tracked to commit 

You see the regression since it is doing more newidle balance? 

> c5b0a7eefc70150caf23e37bc9d639c68c87a097
> 
>     sched/fair: Remove sysctl_sched_migration_cost condition
> 
>     With a default value of 500us, sysctl_sched_migration_cost is
>     significanlty higher than the cost of load_balance. Remove the
>     condition and rely on the sd->max_newidle_lb_cost to abort
>     newidle_balance.
> 
> 
> Looks like "newidle" balancing is beneficial for a lot of workloads, 
> just not for this specific one. The workload is video encoding, there 
> are 100s-1000s of threads, some are synchonized with mutexes and

s/synchonized/synchronized/
 
> conditional variables. The process aims to have a portion of CPU idle, 
> so no CPU cores are 100% busy. Perhaps, the performance impact we see 
> comes from additional processing in the scheduler and additional cost 
> like more cache misses, and not from an incorrect balancing. See
> perf output below.
> 
> My understanding is that "sched_relax_domain_level" cgroup parameter 
> should control if newidle_balance() is called and what's the scope

s/newidle_balance()/sched_balance_newidle()   at all the places since the 
name has been changed recently. 

> of the balancing is, but it doesn't fully work for this case.
> 
> cpusets.rst documentation:
>> The 'cpuset.sched_relax_domain_level' file allows you to request changing
>> this searching range as you like.  This file takes int value which
>> indicates size of searching range in levels ideally as follows,
>> otherwise initial value -1 that indicates the cpuset has no request.
>>  
>> ====== ===========================================================
>>   -1   no request. use system default or follow request of others.
>>    0   no search.
>>    1   search siblings (hyperthreads in a core).
>>    2   search cores in a package.
>>    3   search cpus in a node [= system wide on non-NUMA system]
>>    4   search nodes in a chunk of node [on NUMA system]
>>    5   search system wide [on NUMA system]
>> ====== ===========================================================
> 

I think this document needs to be updated. levels need not be serial order 
due to sched domains degenation. It should have a paragraph which tells the user
to take a look at /sys/kernel/debug/sched/domains/cpu*/domain*/ for system 
specific details. 

> Setting cpuset.sched_relax_domain_level to 0 works as 1.
> 
> On a dual-CPU server, domains and levels are as follows:
>   domain 0: level 0, SMT
>   domain 1: level 2, MC
>   domain 2: level 5, NUMA
> 
> So, to support "0 no search", the value in 
> cpuset.sched_relax_domain_level should disable SD_BALANCE_NEWIDLE for a 
> specified level and keep it enabled for prior levels. For example, SMT 
> level is 0, so sched_relax_domain_level=0 should exclude levels >=0.
> 
> Instead, cpuset.sched_relax_domain_level enables the specified level,
> which effectively removes "no search" option. See below for domain
> flags for all cpuset.sched_relax_domain_level values.
> 
> Proposed patch allows clearing SD_BALANCE_NEWIDLE flags when 
> cpuset.sched_relax_domain_level is set to 0 and extends max
> value validation range beyond sched_domain_level_max. This allows
> setting SD_BALANCE_NEWIDLE on all levels and override platform
> default if it does not include all levels.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ