[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEWA0a5jP+pigtqa9uS4mA0TpiDb5+pubhcPU+D1CyDuW4vhSw@mail.gmail.com>
Date: Mon, 17 Apr 2023 12:24:59 -0700
From: Andrei Vagin <avagin@...gle.com>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Andrei Vagin <avagin@...il.com>, Kees Cook <keescook@...omium.org>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org,
Christian Brauner <brauner@...nel.org>,
Andy Lutomirski <luto@...capital.net>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Peter Oskolkov <posk@...gle.com>,
Tycho Andersen <tycho@...ho.pizza>,
Will Drewry <wad@...omium.org>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH 2/6] sched: add WF_CURRENT_CPU and externise ttwu
On Mon, Apr 10, 2023 at 11:17 AM Chen Yu <yu.c.chen@...el.com> wrote:
>
> On 2023-04-09 at 21:56:26 -0700, Andrei Vagin wrote:
> > On Fri, Apr 7, 2023 at 8:20 PM Chen Yu <yu.c.chen@...el.com> wrote:
> > >
> > > On 2023-03-07 at 23:31:57 -0800, Andrei Vagin wrote:
> > > > From: Peter Oskolkov <posk@...gle.com>
> > > >
> > > > Add WF_CURRENT_CPU wake flag that advices the scheduler to
> > > > move the wakee to the current CPU. This is useful for fast on-CPU
> > > > context switching use cases.
> > > >
> > > > In addition, make ttwu external rather than static so that
> > > > the flag could be passed to it from outside of sched/core.c.
> > > >
> > > > Signed-off-by: Peter Oskolkov <posk@...gle.com>
> > > > Signed-off-by: Andrei Vagin <avagin@...gle.com>
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -7569,6 +7569,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> > > > if (wake_flags & WF_TTWU) {
> > > > record_wakee(p);
> > > >
> > > > + if ((wake_flags & WF_CURRENT_CPU) &&
> > > > + cpumask_test_cpu(cpu, p->cpus_ptr))
> > > > + return cpu;
> > > > +
> > > I tried to reuse WF_CURRENT_CPU to mitigate the cross-cpu wakeup, however there
> > > are regressions when running some workloads, and these workloads want to be
> > > spreaded on idle CPUs whenever possible.
> > > The reason for the regression is that, above change chooses current CPU no matter
> > > what the load/utilization of this CPU is. So task are stacked on 1 CPU and hurts
> > > throughput/latency. And I believe this issue would be more severe on system with
> > > smaller number of CPU within 1 LLC(when compared to Intel platforms), such as AMD,
> > > Arm64.
> >
> > WF_CURRENT_CPU works only in certain conditions. Maybe you saw my
> > attempt to change how WF_SYNC works:
> >
> > https://www.spinics.net/lists/kernel/msg4567650.html
> >
> > Then we've found that this idea doesn't work well, and it is a reason
> > why we have the separate WF_CURRENT_CPU flag.
> >
> I see, in seccomp case, even the idle CPU is not a better choice.
> > >
> > > I know WF_CURRENT_CPU benefits seccomp, and can we make this change more genefic
> > > to benefit other workloads, by making the condition to trigger WF_CURRENT_CPU stricter?
> > > Say, only current CPU has 1 runnable task, and treat current CPU as the last resort by
> > > checking if the wakee's previous CPU is not idle. In this way, we can enable WF_CURRENT_CPU flag
> > > dynamically when some condition is met(a short task for example).
> >
> > We discussed all of these here and here:
> >
> > https://www.spinics.net/lists/kernel/msg4657545.html
> >
> > https://lore.kernel.org/lkml/CANaxB-yWkKzhhPMGXCQbtjntJbqZ40FL2qtM2hk7LLWE-ZpbAg@mail.gmail.com/
> >
> > I like your idea about short-duration tasks, but I think it is a
> > separate task and it has to be done in a separate patch set. Here, I
> > solve the problem of optimizing synchronous switches when one task wakes
> > up another one and falls asleep immediately after that. Waking up the
> > target task on the current CPU looks reasonable for a few reasons in
> > this case. First, waking up a task on the current CPU is cheaper than on
> > another one and it is much cheaper than waking on an idle cpu.
> It depends. For waker and wakee that compete for cache resource and do
> not have share data, sometimes an idle target would be better.
> > Second,
> > when tasks want to do synchronous switches, they often exchange some
> > data, so memory caches can play on us.
> I like the name of 'WF_CURRENT_CPU' too : ) and I was thinking that if this could
> become a auto-detect behavior so others can benefit from this.
>
> If I understand correctly, the scenario this patch deals with is:
> task A wakeups task B, task A and taks B have close relationship with each
> other(cache sharing eg), when task A fall asleep, choose A's CPU, rather than an
> idle CPU.
>
> I'm thinking if the following logic would cover your case:
> 1. the waker A is a short duration one (waker will fall asleep soon)
> 2. the waker B is a short duration one (impact of B is minor)
In the seccomp case, A or B can be a non-short-duration but if they do
synchronous
switches they get all the benefits of WF_CURRENT_CPU.
> 3. the A->wakee_flips is 0 and A->last_wakee = B
In our case, a "supervisor" is written in golang and there are
goroutines that can be
executed from different system threads, so this condition will fail often too.
> 4. the A->wakee_flips is 0 and B->last_wakee = A
> 5, cpu(A)->nr_running = 1
>
> (3 and 4 mean that, A and B wake up each other, so it is likely that
> they share cache data, and they are good firends to be put together)
>
> If above conditions are met, choose current CPU. In this way, WF_CURRENT_CPU
> can be set dynamically.
>
> thanks,
> Chenyu
Powered by blists - more mailing lists