linux-kernel - Re: [PATCH 1/7] sched/core: uclamp: add CPU clamp groups accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180413115229.GW14248@e110439-lin>
Date:   Fri, 13 Apr 2018 12:52:29 +0100
From:   Patrick Bellasi <patrick.bellasi@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Paul Turner <pjt@...gle.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Steve Muckle <smuckle@...gle.com>
Subject: Re: [PATCH 1/7] sched/core: uclamp: add CPU clamp groups accounting

On 13-Apr 12:47, Patrick Bellasi wrote:
> On 13-Apr 13:36, Peter Zijlstra wrote:
> > On Fri, Apr 13, 2018 at 12:15:10PM +0100, Patrick Bellasi wrote:
> > > On 13-Apr 10:43, Peter Zijlstra wrote:
> > > > On Mon, Apr 09, 2018 at 05:56:09PM +0100, Patrick Bellasi wrote:
> > > > > +static inline void uclamp_task_update(struct rq *rq, struct task_struct *p)
> > > > > +{
> > > > > +	int cpu = cpu_of(rq);
> > > > > +	int clamp_id;
> > > > > +
> > > > > +	/* The idle task does not affect CPU's clamps */
> > > > > +	if (unlikely(p->sched_class == &idle_sched_class))
> > > > > +		return;
> > > > > +	/* DEADLINE tasks do not affect CPU's clamps */
> > > > > +	if (unlikely(p->sched_class == &dl_sched_class))
> > > > > +		return;
> > > > > +
> > > > > +	for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) {
> > > > > +		if (uclamp_task_affects(p, clamp_id))
> > > > > +			uclamp_cpu_put(p, cpu, clamp_id);
> > > > > +		else
> > > > > +			uclamp_cpu_get(p, cpu, clamp_id);
> > > > > +	}
> > > > > +}
> > > > 
> > > > Is that uclamp_task_affects() thing there to fix up the fact you failed
> > > > to propagate the calling context (enqueue/dequeue) ?
> > > 
> > > Not really, it's intended by design: we back annotate the clamp_group
> > > a task has been refcounted in.
> > > 
> > > The uclamp_task_affects() tells if we are refcounted now and then we
> > > know from the back-annotation from which refcounter we need to remove
> > > the task.
> > > 
> > > I found this solution much less racy and effective in avoiding to
> > > screw up the refcounter whenever we look at a task at either
> > > dequeue/migration time and these operations can overlaps with the
> > > slow-path. Meaning, when we change the task specific clamp_group
> > > either via syscall or cgroups attributes.
> > > 
> > > IOW, the back annotation allows to decouple refcounting from
> > > clamp_group configuration in a lockless way.
> > 
> > But it adds extra state and logic, to a fastpath, for no reason.
> > 
> > I suspect you messed up the cgroup side; because the syscall should
> > already have done task_rq_lock() and hold both p->pi_lock and rq->lock
> > and have dequeued the task when changing the attribute.
> 
> Yes, actually I'm using task_rq_lock() from the cgroup callback to
> update each task already queued. And I do the same from the
> sched_setattr syscall...
> 
> > It is actually really hard to make the syscall do it wrong.
> 
> ... thus, I'll look better into this.
>
> Not sure now if there was some other corner-case.

Actually, I've just remembered another use-case for that
back-annotation. That's used when we have cgroups and per-task API
asserting two different clamp values.

For example, a task in a TG with max_clamp=50 is further clamped with a
task specific max_clamp=10. The back annotation tracks the group_id in
which we have been refcount right now, which is the task specific group
in the previous example.

-- 
#include <best/regards.h>

Patrick Bellasi