linux-kernel - Re: [PATCH v3 0/5] Rework system pressure interface to the scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAKfTPtCK-YeM4cJehSb8G0aj40rjGgq2kG-ddgKxdAMAvkbZQg@mail.gmail.com>
Date: Fri, 19 Jan 2024 18:57:41 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: linux@...linux.org.uk, catalin.marinas@....com, will@...nel.org, 
	sudeep.holla@....com, rafael@...nel.org, viresh.kumar@...aro.org, 
	agross@...nel.org, andersson@...nel.org, konrad.dybcio@...aro.org, 
	mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com, 
	vschneid@...hat.com, lukasz.luba@....com, rui.zhang@...el.com, 
	mhiramat@...nel.org, daniel.lezcano@...aro.org, amit.kachhap@...il.com, 
	corbet@....net, gregkh@...uxfoundation.org, 
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org, 
	linux-pm@...r.kernel.org, linux-arm-msm@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org, linux-doc@...r.kernel.org, 
	qyousef@...alina.io
Subject: Re: [PATCH v3 0/5] Rework system pressure interface to the scheduler

On Wed, 10 Jan 2024 at 19:10, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>
> On 09/01/2024 14:29, Vincent Guittot wrote:
> > On Tue, 9 Jan 2024 at 12:34, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
> >>
> >> On 08/01/2024 14:48, Vincent Guittot wrote:
> >>> Following the consolidation and cleanup of CPU capacity in [1], this serie
> >>> reworks how the scheduler gets the pressures on CPUs. We need to take into
> >>> account all pressures applied by cpufreq on the compute capacity of a CPU
> >>> for dozens of ms or more and not only cpufreq cooling device or HW
> >>> mitigiations. we split the pressure applied on CPU's capacity in 2 parts:
> >>> - one from cpufreq and freq_qos
> >>> - one from HW high freq mitigiation.
> >>>
> >>> The next step will be to add a dedicated interface for long standing
> >>> capping of the CPU capacity (i.e. for seconds or more) like the
> >>> scaling_max_freq of cpufreq sysfs. The latter is already taken into
> >>> account by this serie but as a temporary pressure which is not always the
> >>> best choice when we know that it will happen for seconds or more.
> >>
> >> I guess this is related to the 'user space system pressure' (*) slide of
> >> your OSPM '23 talk.
> >
> > yes
> >
> >>
> >> Where do you draw the line when it comes to time between (*) and the
> >> 'medium pace system pressure' (e.g. thermal and FREQ_QOS).
> >
> > My goal is to consider the /sys/../scaling_max_freq as the 'user space
> > system pressure'
> >
> >>
> >> IIRC, with (*) you want to rebuild the sched domains etc.
> >
> > The easiest way would be to rebuild the sched_domain but the cost is
> > not small so I would prefer to skip the rebuild and add a new signal
> > that keep track on this capped capacity
>
> Are you saying that you don't need to rebuild sched domains since
> cpu_capacity information of the sched domain hierarchy is
> independently updated via:
>
> update_sd_lb_stats() {
>
>   update_group_capacity() {
>
>     if (!child)
>       update_cpu_capacity(sd, cpu) {
>
>         capacity = scale_rt_capacity(cpu) {
>
>           max = get_actual_cpu_capacity(cpu) <- (*)
>         }
>
>         sdg->sgc->capacity = capacity;
>         sdg->sgc->min_capacity = capacity;
>         sdg->sgc->max_capacity = capacity;
>       }
>
>   }
>
> }
>
> (*) influence of temporary and permanent (to be added) frequency
> pressure on cpu_capacity (per-cpu and in sd data)


I'm more concerned by rd->max_cpu_capacity which remains at original
capacity and triggers spurious LB if we take into account the
userspace max freq instead of the original max compute capacity of a
CPU. And also how to manage this in RT and DL

>
>
> example: hackbench on h960 with IPA:
>                                                                                   cap  min  max
> ...
> hackbench-2284 [007] .Ns..  2170.796726: update_group_capacity: sdg !child cpu=7 1017 1017 1017
> hackbench-2456 [007] ..s..  2170.920729: update_group_capacity: sdg !child cpu=7 1018 1018 1018
>     <...>-2314 [007] ..s1.  2171.044724: update_group_capacity: sdg !child cpu=7 1011 1011 1011
> hackbench-2541 [007] ..s..  2171.168734: update_group_capacity: sdg !child cpu=7  918  918  918
> hackbench-2558 [007] .Ns..  2171.228716: update_group_capacity: sdg !child cpu=7  912  912  912
>     <...>-2321 [007] ..s..  2171.352718: update_group_capacity: sdg !child cpu=7  812  812  812
> hackbench-2553 [007] ..s..  2171.476721: update_group_capacity: sdg !child cpu=7  640  640  640
>     <...>-2446 [007] ..s2.  2171.600743: update_group_capacity: sdg !child cpu=7  610  610  610
> hackbench-2347 [007] ..s..  2171.724738: update_group_capacity: sdg !child cpu=7  406  406  406
> hackbench-2331 [007] .Ns1.  2171.848768: update_group_capacity: sdg !child cpu=7  390  390  390
> hackbench-2421 [007] ..s..  2171.972733: update_group_capacity: sdg !child cpu=7  388  388  388
> ...