[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAMdQD9S-mbLszeu2pjB4YB2A+1OM5NUV_2xDzCTCc7Qw@mail.gmail.com>
Date: Tue, 15 Nov 2022 08:18:10 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Song Zhang <zhangsong34@...wei.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
mcgrof@...nel.org, keescook@...omium.org, yzaikin@...gle.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: Introduce priority load balance for CFS
On Mon, 14 Nov 2022 at 17:42, Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Sat, 12 Nov 2022 at 03:51, Song Zhang <zhangsong34@...wei.com> wrote:
> >
> > Hi, Vincent
> >
> > On 2022/11/3 17:22, Vincent Guittot wrote:
> > > On Thu, 3 Nov 2022 at 10:20, Song Zhang <zhangsong34@...wei.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2022/11/3 16:33, Vincent Guittot wrote:
> > >>> On Thu, 3 Nov 2022 at 04:01, Song Zhang <zhangsong34@...wei.com> wrote:
> > >>>>
> > >>>> Thanks for your reply!
> > >>>>
> > >>>> On 2022/11/3 2:01, Vincent Guittot wrote:
> > >>>>> On Wed, 2 Nov 2022 at 04:54, Song Zhang <zhangsong34@...wei.com> wrote:
> > >>>>>>
> > >>>>>
> > >>>>> This really looks like a v3 of
> > >>>>> https://lore.kernel.org/all/20220810015636.3865248-1-zhangsong34@huawei.com/
> > >>>>>
> > >>>>> Please keep versioning.
> > >>>>>
> > >>>>>> Add a new sysctl interface:
> > >>>>>> /proc/sys/kernel/sched_prio_load_balance_enabled
> > >>>>>
> > >>>>> We don't want to add more sysctl knobs for the scheduler, we even
> > >>>>> removed some. Knob usually means that you want to fix your use case
> > >>>>> but the solution doesn't make sense for all cases.
> > >>>>>
> > >>>>
> > >>>> OK, I will remove this knobs later.
> > >>>>
> > >>>>>>
> > >>>>>> 0: default behavior
> > >>>>>> 1: enable priority load balance for CFS
> > >>>>>>
> > >>>>>> For co-location with idle and non-idle tasks, when CFS do load balance,
> > >>>>>> it is reasonable to prefer migrating non-idle tasks and migrating idle
> > >>>>>> tasks lastly. This will reduce the interference by SCHED_IDLE tasks
> > >>>>>> as much as possible.
> > >>>>>
> > >>>>> I don't agree that it's always the best choice to migrate a non-idle task 1st.
> > >>>>>
> > >>>>> CPU0 has 1 non idle task and CPU1 has 1 non idle task and hundreds of
> > >>>>> idle task and there is an imbalance between the 2 CPUS: migrating the
> > >>>>> non idle task from CPU1 to CPU0 is not the best choice
> > >>>>>
> > >>>>
> > >>>> If the non idle task on CPU1 is running or cache hot, it cannot be
> > >>>> migrated and idle tasks can also be migrated from CPU1 to CPU0. So I
> > >>>> think it does not matter.
> > >>>
> > >>> What I mean is that migrating non idle tasks first is not a universal
> > >>> win and not always what we want.
> > >>>
> > >>
> > >> But migrating online tasks first is mostly a trade-off that
> > >> non-idle(Latency Sensitive) tasks can obtain more CPU time and minimize
> > >> the interference caused by IDLE tasks. I think this makes sense in most
> > >> cases, or you can point out what else I need to think about it ?
> > >>
> > >> Best regards.
> > >>
> > >>>>
> > >>>>>>
> > >>>>>> Testcase:
> > >>>>>> - Spawn large number of idle(SCHED_IDLE) tasks occupy CPUs
> > >>>>>
> > >>>>> What do you mean by a large number ?
> > >>>>>
> > >>>>>> - Let non-idle tasks compete with idle tasks for CPU time.
> > >>>>>>
> > >>>>>> Using schbench to test non-idle tasks latency:
> > >>>>>> $ ./schbench -m 1 -t 10 -r 30 -R 200
> > >>>>>
> > >>>>> How many CPUs do you have ?
> > >>>>>
> > >>>>
> > >>>> OK, some details may not be mentioned.
> > >>>> My virtual machine has 8 CPUs running with a schbench process and 5000
> > >>>> idle tasks. The idle task is a while dead loop process below:
> > >>>
> > >>> How can you care about latency when you start 10 workers on 8 vCPUs
> > >>> with 5000 non idle threads ?
> > >>>
> > >>
> > >> No no no... spawn 5000 idle(SCHED_IDLE) processes not 5000 non-idle
> > >> threads, and with 10 non-idle schbench workers on 8 vCPUs.
> > >
> > > yes spawn 5000 idle tasks but my point remains the same
> > >
> >
> > I am so sorry that I have not received your reply for a long time, and I
> > am still waiting for it anxiously. In fact, migrating non-idle tasks 1st
> > works well in most scenarios, so it maybe possible to add a
> > sched_feat(LB_PRIO) to enable or disable that. Finally, I really hope
> > you can give me some better advice.
>
> I have seen that you posted a v4 5 days ago which is on my list to be reviewed.
>
> My concern here remains that selecting non idle task 1st is not always
> the best choices as for example when you have 1 non idle task per cpu
> and thousands of idle tasks moving around. Then regarding your use
> case, the weight of the 5000 idle threads is around twice more than
> the weight of your non idle bench: sum weight of idle threads is 15k
> whereas the weight of your bench is around 6k IIUC how RPS run. This
> also means that the idle threads will take a significant times of the
> system: 5000 / 7000 ticks. I don't understand how you can care about
> latency in such extreme case and I'm interested to get the real use
> case where you can have such situation.
>
> All that to say that idle task remains cfs task with a small but not
> null weight and we should not make them special other than by not
> preempting at wakeup.
Also, as mentioned for a previous version, a task with nice prio 19
has a weight of 15 so if you replace the 5k idle threads with 1k cfs
w/ nice prio 19 threads, you will face a similar problem. So you can't
really care only on the idle property of a task
>
> >
> > Best regards.
> >
> > Song Zhang
Powered by blists - more mailing lists