[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200304135404.146c56eb@gandalf.local.home>
Date: Wed, 4 Mar 2020 13:54:04 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Qais Yousef <qais.yousef@....com>
Cc: Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Pavan Kondeti <pkondeti@...eaurora.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/6] sched/rt: cpupri_find: Implement fallback
mechanism for !fit case
On Wed, 4 Mar 2020 17:39:26 +0000
Qais Yousef <qais.yousef@....com> wrote:
> > > + /*
> > > + * If we failed to find a fitting lowest_mask, make sure we fall back
> > > + * to the last known unfitting lowest_mask.
> > > + *
> > > + * Note that the map of the recorded idx might have changed since then,
> > > + * so we must ensure to do the full dance to make sure that level still
> > > + * holds a valid lowest_mask.
> > > + *
> > > + * As per above, the map could have been concurrently emptied while we
> > > + * were busy searching for a fitting lowest_mask at the other priority
> > > + * levels.
> > > + *
> > > + * This rule favours honouring priority over fitting the task in the
> > > + * correct CPU (Capacity Awareness being the only user now).
> > > + * The idea is that if a higher priority task can run, then it should
> > > + * run even if this ends up being on unfitting CPU.
> > > + *
> > > + * The cost of this trade-off is not entirely clear and will probably
> > > + * be good for some workloads and bad for others.
> > > + *
> > > + * The main idea here is that if some CPUs were overcommitted, we try
> > > + * to spread which is what the scheduler traditionally did. Sys admins
> > > + * must do proper RT planning to avoid overloading the system if they
> > > + * really care.
> > > + */
> > > + if (best_unfit_idx != -1)
> > > + return __cpupri_find(cp, p, lowest_mask, best_unfit_idx);
> >
> > Hmm, this only checks the one index, which can change and then we miss
> > everything. I think we can do better. What about this:
>
> Hmm. I see 2 issues with this approach:
>
> >
> >
> > for (idx = 0; idx < task_pri; idx++) {
> > int found = -1;
> >
> > if (!__cpupri_find(cp, p, lowest_mask, idx))
> > continue;
>
> 1.
>
> __cpupri_find() could update the lowest_mask at the next iteration, so the
> fallback wouldn't be the lowest level, right?
Hmm, you may be right.
>
> >
> > if (!lowest_mask || !fitness_fn)
> > return 1;
> >
> > /* Make sure we have one fit CPU before clearing */
> > for_each_cpu(cpu, lowest_mask) {
> > if (fitness_fn(p, cpu)) {
> > found = cpu;
> > break;
> > }
> > }
> >
> > if (found == -1)
> > continue;
>
> 2.
>
> If we fix 1, then assuming found == -1 for all level, we'll still have the
> problem that the mask is stale.
>
> We can do a full scan again as Tao was suggestion, the 2nd one without any
> fitness check that is. But isn't this expensive?
I was hoping to try to avoid that, but it's not that expensive and will
probably seldom happen. Perhaps we should run some test cases and trace the
results to see how often that can happen.
>
> We risk the mask being stale anyway directly after selecting it. Or a priority
> level might become the lowest level just after we dismissed it.
Sure, but that's still a better effort.
>
> So our best effort could end up lying even if we do the right thing (TM).
>
> >
> > /* Ensure the capacity of the CPUs fit the task */
> > for_each_cpu(cpu, lowest_mask) {
> > if (cpu < found || !fitness_fn(p, cpu))
> > cpumask_clear_cpu(cpu, lowest_mask);
> > }
> >
> > return 1;
> > }
> >
> > This way, if nothing fits we return the untouched lowest_mask, and only
> > clear the lowest_mask bits if we found a fitness cpu.
>
> Because of 1, I think the lowest mask will not be the true lowest mask, no?
> Please correct me if I missed something.
No, I think you are correct.
>
> There's another 'major' problem that I need to bring your attention to,
> find_lowest_rq() always returns the first CPU in the mask.
>
> See discussion below for more details
>
> https://lore.kernel.org/lkml/20200219140243.wfljmupcrwm2jelo@e107158-lin/
>
> In my test because multiple tasks wakeup together they all end up going to CPU1
> (the first fitting CPU in the mask), then just to be pushed back again. Not
> necessarily to where they were running before.
>
> Not sure if there are other situations where this could happen.
>
> If we fix this problem then we can help reduce the effect of this raciness in
> find_lowest_rq(), and end up with less ping-ponging if tasks happen to
> wakeup/sleep at the wrong time during the scan.
Hmm, I wonder if there's a fast way of getting the next CPU from the
current cpu the task is on. Perhaps that will help in the ping ponging.
-- Steve
>
> Or maybe there is a way to eliminate all these races with the current design?
>
> Thanks
>
> --
> Qais Yousef
Powered by blists - more mailing lists