linux-kernel - Re: [PATCH v2] sched: rt: Make RT capacity aware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a5b59c1c-3863-b816-6faf-a3b6fd301834@arm.com>
Date:   Thu, 28 Nov 2019 14:59:12 +0100
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Qais Yousef <qais.yousef@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched: rt: Make RT capacity aware

On 30/10/2019 18:43, Qais Yousef wrote:
> On 10/30/19 12:57, Dietmar Eggemann wrote:
>> On 09.10.19 12:46, Qais Yousef wrote:
>>
>> [...]
>>
>>> Changes in v2:
>>> 	- Use cpupri_find() to check the fitness of the task instead of
>>> 	  sprinkling find_lowest_rq() with several checks of
>>> 	  rt_task_fits_capacity().
>>>
>>> 	  The selected implementation opted to pass the fitness function as an
>>> 	  argument rather than call rt_task_fits_capacity() capacity which is
>>> 	  a cleaner to keep the logical separation of the 2 modules; but it
>>> 	  means the compiler has less room to optimize rt_task_fits_capacity()
>>> 	  out when it's a constant value.
>>
>> I would prefer exporting rt_task_fits_capacity() sched-internally via
>> kernel/sched/sched.h. Less code changes and the indication whether
>> rt_task_fits_capacity() has to be used in cpupri_find() is already given
>> by lowest_mask being !NULL or NULL.
>>
> 
> I don't mind if the maintainers agree too. The reason I did that way is because
> it keeps the implementation of cpupri generic and self contained.
> 
> rt_task_fits_capacity() at the moment is a static function in rt.c
> To use it in cpupri_find() I either need to make it public somewhere or have an
> 
> 	extern bool rt_task_fits_capacity(...);
> 
> in cpupri.c. Neither of which appealed to me personally.
> 
>> [...]
>>
>>> +inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
>>> +{
>>> +	unsigned int min_cap;
>>> +	unsigned int max_cap;
>>> +	unsigned int cpu_cap;
>>
>> Nit picking. Since we discussed it already,
>>
>> I found this "Also please try to aggregate variables of the same type
>> into a single line. There is no point in wasting screen space::" ;-)
>>
>> https://lore.kernel.org/r/20181107171149.165693799@linutronix.de
> 
> That wasn't merged at the end AFAICT :)
> 
> It's not my preferred style in this case if I have the choice - but I promise
> to change it if I ended up having to spin another version anyway :)
> 
>>
>> [...]
>>
>>> @@ -2223,7 +2273,10 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
>>>  	 */
>>>  	if (task_on_rq_queued(p) && rq->curr != p) {
>>>  #ifdef CONFIG_SMP
>>> -		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
>>> +		bool need_to_push = rq->rt.overloaded ||
>>> +				    !rt_task_fits_capacity(p, cpu_of(rq));
>>> +
>>> +		if (p->nr_cpus_allowed > 1 && need_to_push)
>>>  			rt_queue_push_tasks(rq);
>>>  #endif /* CONFIG_SMP */
>>>  		if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
>> What happens to a always running CFS task which switches to RT? Luca
>> introduced a special migrate callback (next to push and pull)
>> specifically to deal with this scenario. A lot of new infrastructure for
>> this one use case, but still, do we care for it in RT as well?
>>
>> https://lore.kernel.org/r/20190506044836.2914-4-luca.abeni@santannapisa.it
>>
> 
> Good question. This scenario and the one where uclamp values are changed while
> an RT task is running are similar.
> 
> In both cases the migration will happen on the next activation/wakeup AFAICS.
> 
> I am not sure an always running rt/deadline task is something conceivable in
> practice and we want to cater for. It is certainly something we can push a fix
> for in the future on top of this. IMHO we're better off trying to keep the
> complexity low until we have a real scenario/use case that justifies it.
> 
> As it stands when the system is overloaded or when there are more RT tasks than
> big cores we're stuffed too. And there are probably more ways where we can
> shoot ourselves in the foot. Using RT and deadline is hard and that's one of
> their unavoidable plagues I suppose.

I'm OK with that. I just wanted to make sure we will apply the same
requirements to the upcoming DL capacity awareness implementation. Not
catering for this use case means that we can skip the migrate callback
from Luca's initial DL capacity awareness patch-set.

I still would prefer to export rt_task_fits_capacity() via
kernel/sched/sched.h but I won't insist on that detail:

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@....com>