linux-kernel - Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190405145530.GA453@aaronlu>
Date:   Fri, 5 Apr 2019 22:55:32 +0800
From:   Aaron Lu <aaron.lu@...ux.alibaba.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...nel.org, tglx@...utronix.de, pjt@...gle.com,
        tim.c.chen@...ux.intel.com, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, subhra.mazumdar@...cle.com,
        fweisbec@...il.com, keescook@...omium.org, kerrnel@...gle.com,
        Aubrey Li <aubrey.intel@...il.com>,
        Julien Desfossez <jdesfossez@...italocean.com>
Subject: Re: [RFC][PATCH 13/16] sched: Add core wide task selection and
 scheduling.

On Tue, Apr 02, 2019 at 10:28:12AM +0200, Peter Zijlstra wrote:
> Another approach would be something like the below:
> 
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -87,7 +87,7 @@ static inline int __task_prio(struct tas
>   */
>  
>  /* real prio, less is less */
> -static inline bool __prio_less(struct task_struct *a, struct task_struct *b, bool runtime)
> +static inline bool __prio_less(struct task_struct *a, struct task_struct *b, u64 vruntime)
>  {
>  	int pa = __task_prio(a), pb = __task_prio(b);
>  
> @@ -104,21 +104,25 @@ static inline bool __prio_less(struct ta
>  	if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
>  		return !dl_time_before(a->dl.deadline, b->dl.deadline);
>  
> -	if (pa == MAX_RT_PRIO + MAX_NICE && runtime) /* fair */
> -		return !((s64)(a->se.vruntime - b->se.vruntime) < 0);
> +	if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */
> +		return !((s64)(a->se.vruntime - vruntime) < 0);
		                                         ~~~
I think <= should be used here, so that two tasks with the same vruntime
will return false. Or we could bounce two tasks having different tags
with one set to max in the first round and the other set to max in the
next round. CPU would stuck in __schedule() with irq disabled.

>  
>  	return false;
>  }
>  
>  static inline bool cpu_prio_less(struct task_struct *a, struct task_struct *b)
>  {
> -	return __prio_less(a, b, true);
> +	return __prio_less(a, b, b->se.vruntime);
>  }
>  
>  static inline bool core_prio_less(struct task_struct *a, struct task_struct *b)
>  {
> -	/* cannot compare vruntime across CPUs */
> -	return __prio_less(a, b, false);
> +	u64 vruntime = b->se.vruntime;
> +
> +	vruntime -= task_rq(b)->cfs.min_vruntime;
> +	vruntime += task_rq(a)->cfs.min_vruntime

After some testing, I figured task_cfs_rq() should be used instead of
task_rq(:-)

With the two changes(and some other minor ones that still need more time
to sort out), I'm now able to start doing 2 full CPU kbuilds in 2 tagged
cgroups. Previouslly, the system would hang pretty soon after I started
kbuild in any tagged cgroup(presumbly, CPUs stucked in __schedule() with
irqs disabled).

And there is no warning appeared due to two tasks having different tags
get scheduled on the same CPU.

Thanks,
Aaron

> +
> +	return __prio_less(a, b, vruntime);
>  }
>  
>  static inline bool __sched_core_less(struct task_struct *a, struct task_struct *b)