[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAB8ipk_0YxWnS-k+HLPnL7DRR1MM+WH-xQfna7jD_+TQ0vKi8Q@mail.gmail.com>
Date: Mon, 8 Nov 2021 16:49:39 +0800
From: Xuewen Yan <xuewen.yan94@...il.com>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Zhaoyang Huang <huangzhaoyang@...il.com>,
Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Zhaoyang Huang <zhaoyang.huang@...soc.com>,
"open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
xuewen.yan@...soc.com, Ke Wang <Ke.Wang@...soc.com>
Subject: Re: [Resend PATCH] psi : calc cfs task memstall time more precisely
Hi Dietmar
On Sat, Nov 6, 2021 at 1:20 AM Dietmar Eggemann
<dietmar.eggemann@....com> wrote:
>
> On 05/11/2021 06:58, Zhaoyang Huang wrote:
> >> I don't understand the EAS (probably asymmetric CPU capacity is meant
> >> here) angle of the story. Pressure on CPU capacity which is usable for
> >> CFS happens on SMP as well?
> > Mentioning EAS here mainly about RT tasks preempting small CFS tasks
> > (big CFS tasks could be scheduled to big core), which would introduce
> > more proportion of preempted time within PSI_MEM_STALL than SMP does.
>
> What's your CPU layout? Do you have the little before the big CPUs? Like
> Hikey 960?
>
> root@...aro-developer:~# cat /sys/devices/system/cpu/cpu*/cpu_capacity
> 462
> 462
> 462
> 462
> 1024
> 1024
> 1024
> 1024
>
> And I guess rt class prefers lower CPU numbers hence you see this?
>
our CPU layout is:
xuewen.yan:/ # cat /sys/devices/system/cpu/cpu*/cpu_capacity
544
544
544
544
544
544
1024
1024
And in our platform, we use the kernel in mobile phones with Android.
And we prefer power, so we prefer the RT class to run on little cores.
> >>
> >> This will let the idle task (swapper) pass. Is this indented? Or do you
> >> want to only let CFS tasks (including SCHED_IDLE) pass?
> > idle tasks will NOT call psi_memstall_xxx. We just want CFS tasks to
> > scale the STALL time.
>
> Not sure I get this.
>
> __schedule() -> psi_sched_switch() -> psi_task_change() ->
> psi_group_change() -> record_times() -> psi_memtime_fixup()
>
> is something else than calling psi_memstall_enter() or _leave()?
>
> IMHO, at least record_times() can be called with current equal
> swapper/X. Or is it that PSI_MEM_SOME is never set for the idle task in
> this callstack? I don't know the PSI internals.
>
> >>
> >> if (current->sched_class != &fair_sched_class)
> >> return growth_fixed;
> >>
> >>>>>> +
> >>>>>> + if (current->in_memstall)
> >>>>>> + growth_fixed = div64_ul((1024 - rq->avg_rt.util_avg - rq->avg_dl.util_avg
> >>>>>> + - rq->avg_irq.util_avg + 1) * growth, 1024);
> >>>>>> +
> >>
> >> We do this slightly different in scale_rt_capacity() [fair.c]:
> >>
> >> max = arch_scale_cpu_capacity(cpu_of(rq) /* instead of 1024 to support
> >> asymmetric CPU capacity */
> > Is it possible that the SUM of rqs' util_avg large than
> > arch_scale_cpu_capacity because of task migration things?
>
> I assume you meant if the rq (cpu_rq(CPUx)) util_avg sum (CFS, RT, DL,
> IRQ and thermal part) can be larger than arch_scale_cpu_capacity(CPUx)?
>
> Yes it can.
>
> Have a lock at
>
> effective_cpu_util(..., max, ...) {
>
> if (foo >= max)
> return max;
>
> }
>
> Even the CFS part (cpu_rq(CPUx)->cfs.avg.util_avg) can be larger than
> the original cpu capacity (rq->cpu_capacity_orig).
>
> Have a look at cpu_util(). capacity_orig_of(CPUx) and
> arch_scale_cpu_capacity(CPUx) both returning rq->cpu_capacity_orig.
>
Well, your means is we should not use the 1024 and should use the
original cpu capacity?
And maybe use the "sched_cpu_util()" is a good choice just like this:
+ if (current->in_memstall)
+ growth_fixed = div64_ul(cpu_util_cfs(rq) * growth,
sched_cpu_util(rq->cpu, capacity_orig_of(rq->cpu)));
Thanks!
BR
xuewen
Powered by blists - more mailing lists