lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240126020743.tca257nvnlpyya2y@airbuntu>
Date: Fri, 26 Jan 2024 02:07:43 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Ingo Molnar <mingo@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	linux-kernel@...r.kernel.org,
	Pierre Gondois <Pierre.Gondois@....com>
Subject: Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when
 updating misfit

On 01/25/24 18:50, Vincent Guittot wrote:
> On Wed, 24 Jan 2024 at 23:38, Qais Yousef <qyousef@...alina.io> wrote:
> >
> > On 01/23/24 18:22, Vincent Guittot wrote:
> >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index bcea3d55d95d..0830ceb7ca07 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -5065,17 +5065,61 @@ static inline int task_fits_cpu(struct task_struct *p, int cpu)
> > > >
> > > >  static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
> > > >  {
> > > > +       unsigned long uclamp_min, uclamp_max;
> > > > +       unsigned long util, cpu_cap;
> > > > +       int cpu = cpu_of(rq);
> > > > +
> > > >         if (!sched_asym_cpucap_active())
> > > >                 return;
> > > >
> > > > -       if (!p || p->nr_cpus_allowed == 1) {
> > > > -               rq->misfit_task_load = 0;
> > > > -               return;
> > > > -       }
> > > > +       if (!p || p->nr_cpus_allowed == 1)
> > > > +               goto out;
> > > >
> > > > -       if (task_fits_cpu(p, cpu_of(rq))) {
> > > > -               rq->misfit_task_load = 0;
> > > > -               return;
> > > > +       cpu_cap = arch_scale_cpu_capacity(cpu);
> > > > +
> > > > +       /* If we can't fit the biggest CPU, that's the best we can ever get. */
> > > > +       if (cpu_cap == SCHED_CAPACITY_SCALE)
> > > > +               goto out;
> > > > +
> > > > +       uclamp_min = uclamp_eff_value(p, UCLAMP_MIN);
> > > > +       uclamp_max = uclamp_eff_value(p, UCLAMP_MAX);
> > > > +       util = task_util_est(p);
> > > > +
> > > > +       if (util_fits_cpu(util, uclamp_min, uclamp_max, cpu) > 0)
> > > > +               goto out;
> > > > +
> > > > +       /*
> > > > +        * If the task affinity is not set to default, make sure it is not
> > > > +        * restricted to a subset where no CPU can ever fit it. Triggering
> > > > +        * misfit in this case is pointless as it has no where better to move
> > > > +        * to. And it can lead to balance_interval to grow too high as we'll
> > > > +        * continuously fail to move it anywhere.
> > > > +        */
> > > > +       if (!cpumask_equal(p->cpus_ptr, cpu_possible_mask)) {
> > > > +               unsigned long clamped_util = clamp(util, uclamp_min, uclamp_max);
> > > > +               bool has_fitting_cpu = false;
> > > > +               struct asym_cap_data *entry;
> > > > +
> > > > +               rcu_read_lock();
> > > > +               list_for_each_entry_rcu(entry, &asym_cap_list, link) {
> > >
> > > Do we really want to potentially do this loop at every pick_next task ?
> >
> > The common case should return quickly as the biggest CPU should be present
> > in every task by default. And after sorting the biggest CPU will be the first
> > entry and we should return after one check.
> >
> > Could we move the update to another less expensive location instead?
> 
> TBH, I don't know. I would need time to think about this...
> May be when we set the new affinity of the task

I was thinking to actually call update_misfit_status() from another less
expensive location.

We can certainly do something to help the check less expensive if we must do it
in pick_next_task(). For example set a flag if the task belongs to a single
capacity value; and store the highest capacity its affinity belongs too. But
with cpuset v1, v2 and hotplug I am wary that might get messy.

> 
> >
> > We could try to do better tracking for CPUs that has their affinity changed,
> > but I am not keen on sprinkling more complexity else where to deal with this.
> >
> > We could keep the status quouo and just prevent the misfit load balancing from
> > increment nr_failed similar to newidle_balance too. I think this should have
> 
> One main advantage is that we put the complexity out of the fast path

How about when we update_load_avg()? After all it's the util the decides if we
become misfit. So it makes sense to do the check when we update the util for
the task.

Which reminds me of another bug. We need to call update_misfit_status() when
uclamp values change too.

> 
> > a similar effect. Not ideal but if this is considered too expensive still
> > I can't think of other options that don't look ugly to me FWIW.
> >
> >
> > Thanks
> >
> > --
> > Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ