linux-kernel - Re: [PATCH v4 1/2] sched/uclamp: Fix initialization of struct uclamp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200629121240.pxec7mvoum6xp2z4@e107158-lin.cambridge.arm.com>
Date:   Mon, 29 Jun 2020 13:12:40 +0100
From:   Qais Yousef <qais.yousef@....com>
To:     Patrick Bellasi <patrick.bellasi@...bug.net>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Chris Redpath <chris.redpath@....com>,
        Lukasz Luba <lukasz.luba@....com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/2] sched/uclamp: Fix initialization of struct
 uclamp_rq

Hi Patrick

On 06/26/20 14:32, Patrick Bellasi wrote:
> 
> On Thu, Jun 25, 2020 at 17:43:51 +0200, Qais Yousef <qais.yousef@....com> wrote...
> 
> > struct uclamp_rq was zeroed out entirely in assumption that in the first
> > call to uclamp_rq_inc() they'd be initialized correctly in accordance to
> > default settings.
> 
> Perhaps I was not clear in my previous comment:
> 
>    https://lore.kernel.org/lkml/87sgekorfq.derkling@matbug.net/
> 
> when I did say:
> 
>    Does not this means the problem is more likely with
>    uclamp_rq_util_with(), which should be guarded?
> 
> I did not mean that we have to guard the calls to that function but
> instead that we should just make that function aware of uclamp being
> opted in or not.
> 
> > But when next patch introduces a static key to skip
> > uclamp_rq_{inc,dec}() until userspace opts in to use uclamp, schedutil
> > will fail to perform any frequency changes because the
> > rq->uclamp[UCLAMP_MAX].value is zeroed at init and stays as such. Which
> > means all rqs are capped to 0 by default.
> 
> The initialization you wants to do here it's needed because with the
> current approach you keep calling the same uclamp_rq_util_with() and
> keep doing min/max aggregations even when uclamp is not opted in.
> But this means also that we have min/max aggregation _when not really
> required_.
> 
> > Fix it by making sure we do proper initialization at init without
> > relying on uclamp_rq_inc() doing it later.
> 
> My proposal was as simple as:
> 
> ---8<---
>   static __always_inline
>   unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
>   				  struct task_struct *p)
>   {
>   	unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
>   	unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
>   
> +       if (!static_branch_unlikely(&sched_uclamp_used))
> +               return rt_task(p) ? uclamp_none(UCLAMP_MAX) : util 

This the same mistake I've done.

schedutil_cpu_util() is called by sugov_get_util() passing p as NULL.

292 static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)                                                                                                                                               
293 {                                                                               
294         struct rq *rq = cpu_rq(sg_cpu->cpu);                                    
295         unsigned long util = cpu_util_cfs(rq);                                  
296         unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);               
297                                                                                 
298         sg_cpu->max = max;                                                      
299         sg_cpu->bw_dl = cpu_bw_dl(rq);                                          
300                                                                                 
301         return schedutil_cpu_util(sg_cpu->cpu, util, max, FREQUENCY_UTIL, NULL);
302 }

So this will not fix the problem with RT tasks and we have to do the check
at schedutil which is the right abstraction layer anyway. I don't think calling
rq_rt_is_runnable() is right here and is unnecessary duplication since
schedutil will still have to do that anyway.

When p is not NULL then the function will work correctly in regards to
uclamp_eff_get() returning the right value, including the default settings.
Which is important when my patch to introduce sysctl for default RT boosting
comes into play.

The function deserves a comment for sure though so that users are aware that
the function will not be useful if the static key is disabled. There's no
aggregation done at rq level and this can certainly catch people out.

We can add a static key to return util directly too. I might just do this since
it might help with readability.

Please shout if I missed something.

>   
>   	if (p) {
>   		min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
>   		max_util = max(max_util, uclamp_eff_value(p, UCLAMP_MAX));
>   	}
>   
>   	/*
>   	 * Since CPU's {min,max}_util clamps are MAX aggregated considering
>   	 * RUNNABLE tasks with _different_ clamps, we can end up with an
>   	 * inversion. Fix it now when the clamps are applied.
>   	 */
>   	if (unlikely(min_util >= max_util))
>   		return min_util;
>   
>   	return clamp(util, min_util, max_util);
>   }
> ---8<---
> 
> Such small change is more self-contained IMHO and does not remove
> an existing optimizations like this lazy RQ's initialization at first

IMHO this is a bug not an optimization. We have to write to these struct
anyway, so why not write the right value to start with? This already caught me
out and wasted over half a day trying to chase it.

Beside what are we optimizing for here? I think correctness dictates we start
with correct initial values if we can, no? How are you expecting this to slow
down sched_init()?

Thanks

--
Qais Yousef

> usage.
> 
> Moreover, it can folded in the following patch, with all the other
> static keys shortcuts.
>