linux-kernel - Re: [PATCH v5 03/15] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181107144809.GH14309@e110439-lin>
Date:   Wed, 7 Nov 2018 14:48:09 +0000
From:   Patrick Bellasi <patrick.bellasi@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Paul Turner <pjt@...gle.com>,
        Quentin Perret <quentin.perret@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Todd Kjos <tkjos@...gle.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Steve Muckle <smuckle@...gle.com>,
        Suren Baghdasaryan <surenb@...gle.com>
Subject: Re: [PATCH v5 03/15] sched/core: uclamp: map TASK's clamp values
 into CPU's clamp groups

On 07-Nov 14:35, Peter Zijlstra wrote:
> On Mon, Oct 29, 2018 at 06:32:57PM +0000, Patrick Bellasi wrote:
> > +/**
> > + * uclamp_group_get: increase the reference count for a clamp group
> > + * @uc_se: the utilization clamp data for the task
> > + * @clamp_id: the clamp index affected by the task
> > + * @clamp_value: the new clamp value for the task
> > + *
> > + * Each time a task changes its utilization clamp value, for a specified clamp
> > + * index, we need to find an available clamp group which can be used to track
>You mean se_count overflow ?

> + * this new clamp value. The corresponding clamp group index will be used to
> > + * reference count the corresponding clamp value while the task is enqueued on
> > + * a CPU.
> > + */
> > +static void uclamp_group_get(struct uclamp_se *uc_se, unsigned int clamp_id,
> > +			     unsigned int clamp_value)
> > +{
> > +	union uclamp_map *uc_maps = &uclamp_maps[clamp_id][0];
> > +	unsigned int prev_group_id = uc_se->group_id;
> > +	union uclamp_map uc_map_old, uc_map_new;
> > +	unsigned int free_group_id;
> > +	unsigned int group_id;
> > +	unsigned long res;
> > +
> > +retry:
> > +
> > +	free_group_id = UCLAMP_GROUPS;
> > +	for (group_id = 0; group_id < UCLAMP_GROUPS; ++group_id) {
> > +		uc_map_old.data = atomic_long_read(&uc_maps[group_id].adata);
> > +		if (free_group_id == UCLAMP_GROUPS && !uc_map_old.se_count)
> > +			free_group_id = group_id;
> > +		if (uc_map_old.value == clamp_value)
> > +			break;
> > +	}
> > +	if (group_id >= UCLAMP_GROUPS) {
> > +#ifdef CONFIG_SCHED_DEBUG
> > +#define UCLAMP_MAPERR "clamp value [%u] mapping to clamp group failed\n"
> > +		if (unlikely(free_group_id == UCLAMP_GROUPS)) {
> > +			pr_err_ratelimited(UCLAMP_MAPERR, clamp_value);
> > +			return;
> > +		}
> > +#endif
> > +		group_id = free_group_id;
> > +		uc_map_old.data = atomic_long_read(&uc_maps[group_id].adata);
> > +	}
> 
> You forgot to check for refcount overflow here ;-)

You mean se_count overflow ?

That se_count is (BITS_PER_LONG - SCHED_CAPACITY_SHIFT - 1)
which makes it able to track up to:

   +2mln  tasks/task_groups on 32bit systems (with SCHED_CAPACITY_SHIFT 10)
   +10^12 tasks/task_groups on 64bit systems (with SCHED_CAPACITY_SHIFT 20)

I don't expect overflow on 64bit systems, do we ?

It's more likely on 32bit systems, especially if in the future we
should increase SCHED_CAPACITY_SHIFT.

> And I'm not really a fan of hiding that error in a define like you keep
> doing.

The #define is not there to mask an overflow, it's there to catch the
case in which the refcount should be corrupted and we end up violating
the invariant: "there is always a clamp group available".

NOTE: that invariant is granted once we add

   sched/core: uclamp: add clamp group bucketing support

The warning reports the issue only on CONFIG_SCHED_DEBUG, but...
it makes sense to keep it always enabled.

While, in case of data corruption, we should just return thus not
setting the scheduling entity as "mapped" towards the end of the
function... which makes me see that it's actually wrong to
conditionally compile the above "return"


> What's wrong with something like:
> 
> 	if (SCHED_WARN(free_group_id == UCLAMP_GROUPS))
> 		return;

Right, the flow should be:

  1. try to find a valid clamp group
  2. if you don't find one, the data structures are corrupted
     warn once for data corruption
     do not map this scheduling entity and return
  3. map the scheduling entity

Is that ok ?


> and
> 
> > +	uc_map_new.se_count = uc_map_old.se_count + 1;
> 
> 	if (SCHED_WARN(!new.se_count))
> 		new.se_count = -1;

Mmm... not sure we can recover from a corrupted refcount or an
overflow.

What should we do on these cases, disable uclamp completely ?


> > +	uc_map_new.value = clamp_value;
> > +	res = atomic_long_cmpxchg(&uc_maps[group_id].adata,
> > +				  uc_map_old.data, uc_map_new.data);
> > +	if (res != uc_map_old.data)
> > +		goto retry;
> > +
> > +	/* Update SE's clamp values and attach it to new clamp group */
> > +	uc_se->value = clamp_value;
> > +	uc_se->group_id = group_id;
> > +
> > +	/* Release the previous clamp group */
> > +	if (uc_se->mapped)
> > +		uclamp_group_put(clamp_id, prev_group_id);
> > +	uc_se->mapped = true;
> > +}

-- 
#include <best/regards.h>

Patrick Bellasi