lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Nov 2018 14:48:09 +0000
From:   Patrick Bellasi <patrick.bellasi@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Paul Turner <pjt@...gle.com>,
        Quentin Perret <quentin.perret@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Todd Kjos <tkjos@...gle.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Steve Muckle <smuckle@...gle.com>,
        Suren Baghdasaryan <surenb@...gle.com>
Subject: Re: [PATCH v5 03/15] sched/core: uclamp: map TASK's clamp values
 into CPU's clamp groups

On 07-Nov 14:35, Peter Zijlstra wrote:
> On Mon, Oct 29, 2018 at 06:32:57PM +0000, Patrick Bellasi wrote:
> > +/**
> > + * uclamp_group_get: increase the reference count for a clamp group
> > + * @uc_se: the utilization clamp data for the task
> > + * @clamp_id: the clamp index affected by the task
> > + * @clamp_value: the new clamp value for the task
> > + *
> > + * Each time a task changes its utilization clamp value, for a specified clamp
> > + * index, we need to find an available clamp group which can be used to track
>You mean se_count overflow ?

> + * this new clamp value. The corresponding clamp group index will be used to
> > + * reference count the corresponding clamp value while the task is enqueued on
> > + * a CPU.
> > + */
> > +static void uclamp_group_get(struct uclamp_se *uc_se, unsigned int clamp_id,
> > +			     unsigned int clamp_value)
> > +{
> > +	union uclamp_map *uc_maps = &uclamp_maps[clamp_id][0];
> > +	unsigned int prev_group_id = uc_se->group_id;
> > +	union uclamp_map uc_map_old, uc_map_new;
> > +	unsigned int free_group_id;
> > +	unsigned int group_id;
> > +	unsigned long res;
> > +
> > +retry:
> > +
> > +	free_group_id = UCLAMP_GROUPS;
> > +	for (group_id = 0; group_id < UCLAMP_GROUPS; ++group_id) {
> > +		uc_map_old.data = atomic_long_read(&uc_maps[group_id].adata);
> > +		if (free_group_id == UCLAMP_GROUPS && !uc_map_old.se_count)
> > +			free_group_id = group_id;
> > +		if (uc_map_old.value == clamp_value)
> > +			break;
> > +	}
> > +	if (group_id >= UCLAMP_GROUPS) {
> > +#ifdef CONFIG_SCHED_DEBUG
> > +#define UCLAMP_MAPERR "clamp value [%u] mapping to clamp group failed\n"
> > +		if (unlikely(free_group_id == UCLAMP_GROUPS)) {
> > +			pr_err_ratelimited(UCLAMP_MAPERR, clamp_value);
> > +			return;
> > +		}
> > +#endif
> > +		group_id = free_group_id;
> > +		uc_map_old.data = atomic_long_read(&uc_maps[group_id].adata);
> > +	}
> 
> You forgot to check for refcount overflow here ;-)

You mean se_count overflow ?

That se_count is (BITS_PER_LONG - SCHED_CAPACITY_SHIFT - 1)
which makes it able to track up to:

   +2mln  tasks/task_groups on 32bit systems (with SCHED_CAPACITY_SHIFT 10)
   +10^12 tasks/task_groups on 64bit systems (with SCHED_CAPACITY_SHIFT 20)

I don't expect overflow on 64bit systems, do we ?

It's more likely on 32bit systems, especially if in the future we
should increase SCHED_CAPACITY_SHIFT.

> And I'm not really a fan of hiding that error in a define like you keep
> doing.

The #define is not there to mask an overflow, it's there to catch the
case in which the refcount should be corrupted and we end up violating
the invariant: "there is always a clamp group available".

NOTE: that invariant is granted once we add

   sched/core: uclamp: add clamp group bucketing support

The warning reports the issue only on CONFIG_SCHED_DEBUG, but...
it makes sense to keep it always enabled.

While, in case of data corruption, we should just return thus not
setting the scheduling entity as "mapped" towards the end of the
function... which makes me see that it's actually wrong to
conditionally compile the above "return"


> What's wrong with something like:
> 
> 	if (SCHED_WARN(free_group_id == UCLAMP_GROUPS))
> 		return;

Right, the flow should be:

  1. try to find a valid clamp group
  2. if you don't find one, the data structures are corrupted
     warn once for data corruption
     do not map this scheduling entity and return
  3. map the scheduling entity

Is that ok ?


> and
> 
> > +	uc_map_new.se_count = uc_map_old.se_count + 1;
> 
> 	if (SCHED_WARN(!new.se_count))
> 		new.se_count = -1;

Mmm... not sure we can recover from a corrupted refcount or an
overflow.

What should we do on these cases, disable uclamp completely ?


> > +	uc_map_new.value = clamp_value;
> > +	res = atomic_long_cmpxchg(&uc_maps[group_id].adata,
> > +				  uc_map_old.data, uc_map_new.data);
> > +	if (res != uc_map_old.data)
> > +		goto retry;
> > +
> > +	/* Update SE's clamp values and attach it to new clamp group */
> > +	uc_se->value = clamp_value;
> > +	uc_se->group_id = group_id;
> > +
> > +	/* Release the previous clamp group */
> > +	if (uc_se->mapped)
> > +		uclamp_group_put(clamp_id, prev_group_id);
> > +	uc_se->mapped = true;
> > +}

-- 
#include <best/regards.h>

Patrick Bellasi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ