lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 22 Apr 2020 14:13:47 +0100
From:   Qais Yousef <qais.yousef@....com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Jonathan Corbet <corbet@....net>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Iurii Zaikin <yzaikin@...gle.com>,
        Quentin Perret <qperret@...gle.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Patrick Bellasi <patrick.bellasi@...bug.net>,
        Pavan Kondeti <pkondeti@...eaurora.org>,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default
 boost value

On 04/22/20 12:59, Dietmar Eggemann wrote:
> On 21/04/2020 13:27, Qais Yousef wrote:
> > On 04/21/20 13:18, Dietmar Eggemann wrote:
> >> On 20/04/2020 17:13, Qais Yousef wrote:
> >>> On 04/20/20 10:29, Dietmar Eggemann wrote:
> >>>> On 03.04.20 14:30, Qais Yousef wrote:
> >>>>
> >>>> [...]
> >>>>
> >>>>> @@ -924,6 +945,14 @@ uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> >>>>>  	return uc_req;
> >>>>>  }
> >>>>>  
> >>>>> +static void uclamp_rt_sync_default_util_min(struct task_struct *p)
> >>>>> +{
> >>>>> +	struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> >>>>> +
> >>>>> +	if (!uc_se->user_defined)
> >>>>> +		uclamp_se_set(uc_se, sysctl_sched_rt_default_uclamp_util_min, false);
> >>>>> +}
> >>>>> +
> >>>>>  unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id)
> >>>>>  {
> >>>>>  	struct uclamp_se uc_eff;
> >>>>> @@ -1030,6 +1059,12 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> >>>>>  	if (unlikely(!p->sched_class->uclamp_enabled))
> >>>>>  		return;
> >>>>>  
> >>>>> +	/*
> >>>>> +	 * When sysctl_sched_rt_default_uclamp_util_min value is changed by the
> >>>>> +	 * user, we apply any new value on the next wakeup, which is here.
> >>>>> +	 */
> >>>>> +	uclamp_rt_sync_default_util_min(p);
> >>>>> +
> >>>>
> >>>> Does this have to be an extra function? Can we not reuse
> >>>> uclamp_tg_restrict() by slightly rename it to uclamp_restrict()?
> 
> Btw, there was an issue in my little snippet. I used uc_req.user_defined
> uninitialized in uclamp_restrict().
> 
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f3706dad32ce..7e6b2b7cd1e5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -903,12 +903,11 @@ uclamp_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>  {
>  	struct uclamp_se uc_req, __maybe_unused uc_max;
>  
> -	if (unlikely(rt_task(p)) && clamp_id == UCLAMP_MIN &&
> -	    !uc_req.user_defined) {
> +	if (unlikely(rt_task(p)) && clamp_id == UCLAMP_MIN) {
>  		struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
>  		int rt_min = sysctl_sched_rt_default_uclamp_util_min;
>  
> -		if (uc_se->value != rt_min) {
> +		if (!uc_se->user_defined && uc_se->value != rt_min) {
>  			uclamp_se_set(uc_se, rt_min, false);
>  			printk("uclamp_restrict() [%s %d] p->uclamp_req[%d].value=%d\n",
>  			       p->comm, p->pid, clamp_id, uc_se->value);
> 
> >>> Hmm the thing is that we're not restricting here. In contrary we're boosting,
> >>> so the name would be misleading.
> >>
> >> I always thought that we're restricting p->uclamp_req[UCLAMP_MIN].value (default 1024) to
> >> sysctl_sched_rt_default_uclamp_util_min (0-1024)?
> > 
> > The way I look at it is that we're *setting* it to
> > sysctl_sched_rt_default_uclamp_util_min if !user_defined.
> > 
> > The restriction mechanism that ensures this set value doesn't escape
> > cgroup/global restrictions setup.
> 
> I guess we overall agree here. 
> 
> I see 3 restriction levels: (!user_defined) task -> taskgroup -> system
> 
> I see sysctl_sched_rt_default_uclamp_util_min (min_rt_default) as a
> restriction on task level.

Hmm from code perspective it is a request. One that is applied by default if
the user didn't make any request.

Since restriction has a different meaning from code point of view, I think
interchanging both could be confusing.

A restriction from the code view is that you can request 1024, but if cgroup or
global settings doesn't allow you, the system will automatically crop it.

The new sysctl doesn't result in any cropping. It is equivalent to a user
making a sched_setattr() call to change UCLAMP_MIN value of a task. It's just
this request is applied automatically by default, if !user_defined.

But if you meant your high level view of how it works you think of it as
a restriction, then yeah, it could be abstracted in this way. The terminology
just conflicts with the code.

> 
> It's true that the task level restriction is setting the value at the same time.
> 
> For CFS (id=UCLAMP_[MIN\|MAX]) and RT (id=UCLAMP_MAX) we use
> uclamp_none(id) and those values (0, 1024) are fixed so these task level
> values don't need to be further restricted.

I wouldn't think of these as restriction. They're default requests, if
I understood what you're saying correctly, by default:

	cfs_task->util_min = 0
	cfs_task->util_max = 1024

	rt_task->util_min = 1024
	rt_task->util_max = 1024

Which are the requested value.

sysctl_util_clamp_{min,max} are the default restriction which by default would
allow the tasks to request any value within the full range.

The root taskgroup will inherit this value by default. And new cgroups will
inherit from the root taskgroup.

> 
> For RT (id=UCLAMP_MIN) we use 'min_rt_default' and since it can change
> we have to check the task level restriction in 'uclamp_eff_get() ->
> uclamp_(tg)_restrict()'.

Yes. If we take the approach to apply the default request in uclamp_eff_get(),
then this must be applied before uclamp_tg_restrict() call.

> 
> root@...0:~# echo 999 > /proc/sys/kernel/sched_rt_default_util_clamp_min
> 
> [ 2540.507236] uclamp_eff_get() [rtkit-daemon 419] tag=0 uclamp_id=0 uc_req.value=1024
> [ 2540.514947] uclamp_eff_get() [rtkit-daemon 419] tag=1 uclamp_id=0 uc_req.value=1024
> [ 2548.015208] uclamp_restrict() [rtkit-daemon 419] p->uclamp_req[0].value=999
> 
> root@...0:~# echo 666 > /proc/sys/kernel/sched_util_clamp_min
> 
> [ 2548.022219] uclamp_eff_get() [rtkit-daemon 419] tag=0 uclamp_id=0 uc_req.value=999
> [ 2548.029825] uclamp_eff_get() [rtkit-daemon 419] tag=1 uclamp_id=0 uc_req.value=999
> [ 2553.479509] uclamp_eff_get() [rtkit-daemon 419] tag=0 uclamp_id=0 uc_max.value=666
> [ 2553.487131] uclamp_eff_get() [rtkit-daemon 419] tag=1 uclamp_id=0 uc_max.value=666
> 
> Haven't tried to put an rt task into a taskgroup other than root.

I do run a test that Patrick had which checks for cgroup values. One of the
tests checks if RT are boosted to max, and it fails when I change the default
RT max-boost value :)

I think we're in agreement, but the terminology is probably making things a bit
confusing.

Thanks

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ