linux-kernel - Re: [PATCH v2] sched: rt: Make RT capacity aware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191030174309.buptfbqha374efpl@e107158-lin.cambridge.arm.com>
Date:   Wed, 30 Oct 2019 17:43:10 +0000
From:   Qais Yousef <qais.yousef@....com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched: rt: Make RT capacity aware

On 10/30/19 12:57, Dietmar Eggemann wrote:
> On 09.10.19 12:46, Qais Yousef wrote:
> 
> [...]
> 
> > Changes in v2:
> > 	- Use cpupri_find() to check the fitness of the task instead of
> > 	  sprinkling find_lowest_rq() with several checks of
> > 	  rt_task_fits_capacity().
> > 
> > 	  The selected implementation opted to pass the fitness function as an
> > 	  argument rather than call rt_task_fits_capacity() capacity which is
> > 	  a cleaner to keep the logical separation of the 2 modules; but it
> > 	  means the compiler has less room to optimize rt_task_fits_capacity()
> > 	  out when it's a constant value.
> 
> I would prefer exporting rt_task_fits_capacity() sched-internally via
> kernel/sched/sched.h. Less code changes and the indication whether
> rt_task_fits_capacity() has to be used in cpupri_find() is already given
> by lowest_mask being !NULL or NULL.
> 

I don't mind if the maintainers agree too. The reason I did that way is because
it keeps the implementation of cpupri generic and self contained.

rt_task_fits_capacity() at the moment is a static function in rt.c
To use it in cpupri_find() I either need to make it public somewhere or have an

	extern bool rt_task_fits_capacity(...);

in cpupri.c. Neither of which appealed to me personally.

> [...]
> 
> > +inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
> > +{
> > +	unsigned int min_cap;
> > +	unsigned int max_cap;
> > +	unsigned int cpu_cap;
> 
> Nit picking. Since we discussed it already,
> 
> I found this "Also please try to aggregate variables of the same type
> into a single line. There is no point in wasting screen space::" ;-)
> 
> https://lore.kernel.org/r/20181107171149.165693799@linutronix.de

That wasn't merged at the end AFAICT :)

It's not my preferred style in this case if I have the choice - but I promise
to change it if I ended up having to spin another version anyway :)

> 
> [...]
> 
> > @@ -2223,7 +2273,10 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
> >  	 */
> >  	if (task_on_rq_queued(p) && rq->curr != p) {
> >  #ifdef CONFIG_SMP
> > -		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
> > +		bool need_to_push = rq->rt.overloaded ||
> > +				    !rt_task_fits_capacity(p, cpu_of(rq));
> > +
> > +		if (p->nr_cpus_allowed > 1 && need_to_push)
> >  			rt_queue_push_tasks(rq);
> >  #endif /* CONFIG_SMP */
> >  		if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
> What happens to a always running CFS task which switches to RT? Luca
> introduced a special migrate callback (next to push and pull)
> specifically to deal with this scenario. A lot of new infrastructure for
> this one use case, but still, do we care for it in RT as well?
> 
> https://lore.kernel.org/r/20190506044836.2914-4-luca.abeni@santannapisa.it
> 

Good question. This scenario and the one where uclamp values are changed while
an RT task is running are similar.

In both cases the migration will happen on the next activation/wakeup AFAICS.

I am not sure an always running rt/deadline task is something conceivable in
practice and we want to cater for. It is certainly something we can push a fix
for in the future on top of this. IMHO we're better off trying to keep the
complexity low until we have a real scenario/use case that justifies it.

As it stands when the system is overloaded or when there are more RT tasks than
big cores we're stuffed too. And there are probably more ways where we can
shoot ourselves in the foot. Using RT and deadline is hard and that's one of
their unavoidable plagues I suppose.

Thanks

--
Qais Yousef