lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 5 Jun 2024 10:14:47 +0100
From: Luis Machado <luis.machado@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
 linux-kernel@...r.kernel.org, kprateek.nayak@....com,
 wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de,
 John Stultz <jstultz@...gle.com>, Hongyan.Xia2@....com
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

On 6/5/24 08:22, Peter Zijlstra wrote:
> On Tue, Jun 04, 2024 at 09:12:20PM +0200, Peter Zijlstra wrote:
> 
>> But with the above, you skip inc for sched_delayed, but dequeue_task()
>> will have done the dec, so isn't it then still unbalanced?
>>
>> Oh well, I'll go stare at this in tomorrow.

It did not appear to keep things unbalanced towards uclamp_rq_dec, but
might've been luck of not hitting the right code path.

In any case ...

> 
> OK, just before I went to play in my giant hamster wheel it hit me.
> 
> When dequeue_task() 'fails' and sets sched_delayed, we'll have done
> uclamp_rq_dec().
> 
> Then, since the delayed task is still on the rq -- per the failure -- it
> can be migrated, this will again do dequeue_task() which will *agian* do
> a uclamp_rq_dec().
> 
> So now we have a double dequeue -- *FAIL*.
> 
> Worse, the migration will then do an enqueue_task() on the new rq
> causing uclamp_rq_inc(). If you then get a ttwu() / ENQUEUE_DELAYED, you
> can tickle yet another uclamp_rq_inc() for another fail.
> 
> Something like the below avoids uclamp_rq_{inc,dec}() when
> ->sched_delayed, and moves it post class->enqueue_task() such that for
> the ENQUEUE_DELAYED case, we *will* do the inc after ->sched_delayed
> gets cleared.
> 
> Hmm?
> 
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9ac1054c2a4bb..965e6464e68e9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1676,6 +1676,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
>  	if (unlikely(!p->sched_class->uclamp_enabled))
>  		return;
>  
> +	if (p->se.sched_delayed)
> +		return;
> +
>  	for_each_clamp_id(clamp_id)
>  		uclamp_rq_inc_id(rq, p, clamp_id);
>  
> @@ -1700,6 +1703,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
>  	if (unlikely(!p->sched_class->uclamp_enabled))
>  		return;
>  
> +	if (p->se.sched_delayed)
> +		return;
> +
>  	for_each_clamp_id(clamp_id)
>  		uclamp_rq_dec_id(rq, p, clamp_id);
>  }
> @@ -1979,9 +1985,10 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
>  		psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED));
>  	}
>  
> -	uclamp_rq_inc(rq, p);
>  	p->sched_class->enqueue_task(rq, p, flags);
>  
> +	uclamp_rq_inc(rq, p);
> +
>  	if (sched_core_enabled(rq))
>  		sched_core_enqueue(rq, p);
>  }
> @@ -2003,6 +2010,7 @@ bool dequeue_task(struct rq *rq, struct task_struct *p, int flags)
>  	}
>  
>  	uclamp_rq_dec(rq, p);
> +
>  	return p->sched_class->dequeue_task(rq, p, flags);
>  }
>  

... thanks for the patch! The above seems to do it for me. I can see
more reasonable energy use with the eevdf-complete series. Still a
bit higher. Might be noise, we'll see.

I'll go stare at it and run some more tests on our end with this fix
applied on top and will report back.




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ