linux-kernel - Re: [PATCH v2 2/2] sched/uclamp: Protect uclamp fast path code with static key

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200624110758.oewxkpdk5fzxwxod@e107158-lin.cambridge.arm.com>
Date:   Wed, 24 Jun 2020 12:07:59 +0100
From:   Qais Yousef <qais.yousef@....com>
To:     Patrick Bellasi <patrick.bellasi@...bug.net>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Chris Redpath <chris.redpath@....com>,
        Lukasz Luba <lukasz.luba@....com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] sched/uclamp: Protect uclamp fast path code with
 static key

On 06/24/20 09:34, Patrick Bellasi wrote:
> 
> On Fri, Jun 19, 2020 at 19:20:11 +0200, Qais Yousef <qais.yousef@....com> wrote...
> 
> [...]
> 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 4265861e13e9..9ab22f699613 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -793,6 +793,25 @@ unsigned int sysctl_sched_uclamp_util_max = SCHED_CAPACITY_SCALE;
> >  /* All clamps are required to be less or equal than these values */
> >  static struct uclamp_se uclamp_default[UCLAMP_CNT];
> >  
> > +/*
> > + * This static key is used to reduce the uclamp overhead in the fast path. It
> > + * only disables the call to uclamp_rq_{inc, dec}() in enqueue/dequeue_task().
> > + *
> > + * This allows users to continue to enable uclamp in their kernel config with
> > + * minimum uclamp overhead in the fast path.
> > + *
> > + * As soon as userspace modifies any of the uclamp knobs, the static key is
> > + * disabled, since we have an actual users that make use of uclamp
> > + * functionality.
> > + *
> > + * The knobs that would disable this static key are:
> > + *
> > + *   * A task modifying its uclamp value with sched_setattr().
> > + *   * An admin modifying the sysctl_sched_uclamp_{min, max} via procfs.
> > + *   * An admin modifying the cgroup cpu.uclamp.{min, max}
> > + */
> > +static DEFINE_STATIC_KEY_TRUE(sched_uclamp_unused);
> 
> I would personally prefer a non negated semantic.
> 
> Why not using 'sched_uclamp_enabled'?

This was already discussed and already changed to sched_uclamp_used which
I will post in v3, which is already ready.

> 
> > +
> >  /* Integer rounded range for each bucket */
> >  #define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
> >  
> > @@ -993,9 +1012,16 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
> >  	lockdep_assert_held(&rq->lock);
> >  
> >  	bucket = &uc_rq->bucket[uc_se->bucket_id];
> > -	SCHED_WARN_ON(!bucket->tasks);
> > -	if (likely(bucket->tasks))
> > -		bucket->tasks--;
> > +
> > +	/*
> > +	 * This could happen if sched_uclamp_unused was disabled while the
> > +	 * current task was running, hence we could end up with unbalanced call
> > +	 * to uclamp_rq_dec_id().
> > +	 */
> > +	if (unlikely(!bucket->tasks))
> > +		return;
> > +
> > +	bucket->tasks--;
> 
> Since above you are not really changing the logic, why changing the
> code?

The logic has changed. We return immediately now.

> 
> The SCHED_WARN_ON/if(likely) is a defensive programming thing.
> I understand that SCHED_WARN_ON() can now be misleading because of the
> unbalanced calls but... why not just removing it?

Do you think we need to execute the rest of the code if bucket->tasks is 0? It
would be good to know if there's a corner case you're worried about. AFAIU if
it is 0, this means there's nothing to do.

> 
> Maybe also adding in the comment, but I don't see valid reasons to
> change the code if the functionality is not changing.

If it means a lot to you, I could put it back as it was. I just don't see why
we can't return immediately instead.

> 
> 
> >  	uc_se->active = false;
> >  
> >  	/*
> > @@ -1031,6 +1057,13 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> >  {
> >  	enum uclamp_id clamp_id;
> >  
> > +	/*
> > +	 * Avoid any overhead until uclamp is actually used by the userspace.
> > +	 * Including the potential JMP if we use static_branch_unlikely()
> 
> The comment above (about unlikely) seems not to match the code?

It does. The comment says if we use unlikely() when uclamp is not used we'll
incur an extra jump/branch. Hence we use likely to avoid this potential
overhead.

> 
> > +	 */
> > +	if (static_branch_likely(&sched_uclamp_unused))
> > +		return;
> 
> Moreover, something like:
> 
>        if (static_key_false(&sched_uclamp_enabled))
>                 return;
> 
> is not just good enough?

Aren't they both good enough?

Use of static_key_false() is deprecated AFAICS.

> 
> > +
> >  	if (unlikely(!p->sched_class->uclamp_enabled))
> >  		return;
> 
> Since we already have these per sched_class gates, I'm wondering if it
> could make sense to just re-purpose them.
> 
> Problem with the static key is that if just one RT task opts in, CFS
> will still pay the overheads, and vice versa too.
> 
> So, an alternative approach could be to opt in sched classes on-demand.

I'll defer to Peter here.

Re-purposing the sched_class->uclamp_enable means we can't use a static key and
we'll potentially still incur a cache penalty in there even if uclamp is not
used. Based on RT sysctl discussion, this won't be okay if we can do better.

> 
> The above if(unlikely) is not exactly has a static key true, but I
> assume we agree the overheads we are tacking are nothing compared to
> that check, aren't they?

I'm not sure we can agree on that. The overhead seems to be system specific. On
juno-r2 it seemed I$, but on 2 sockets systems it seems the D$. So I can't say
for sure no one will care. The current approach gives stronger guarantees.

Thanks

--
Qais Yousef