[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3620bad5-2a27-0f9e-f1f0-70036997d33c@arm.com>
Date: Fri, 21 May 2021 13:23:55 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Will Deacon <will@...nel.org>, Juri Lelli <juri.lelli@...hat.com>
Cc: Quentin Perret <qperret@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
linux-arm-kernel@...ts.infradead.org, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org,
Catalin Marinas <catalin.marinas@....com>,
Marc Zyngier <maz@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Morten Rasmussen <morten.rasmussen@....com>,
Qais Yousef <qais.yousef@....com>,
Suren Baghdasaryan <surenb@...gle.com>,
Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>, kernel-team@...roid.com
Subject: Re: [PATCH v6 13/21] sched: Admit forcefully-affined tasks into
SCHED_DEADLINE
On 21/05/2021 12:37, Will Deacon wrote:
> On Fri, May 21, 2021 at 10:39:32AM +0200, Juri Lelli wrote:
>> On 21/05/21 08:15, Quentin Perret wrote:
>>> On Friday 21 May 2021 at 07:25:51 (+0200), Juri Lelli wrote:
>>>> On 20/05/21 19:01, Will Deacon wrote:
>>>>> On Thu, May 20, 2021 at 02:38:55PM +0200, Daniel Bristot de Oliveira wrote:
>>>>>> On 5/20/21 12:33 PM, Quentin Perret wrote:
>>>>>>> On Thursday 20 May 2021 at 11:16:41 (+0100), Will Deacon wrote:
>>>>>>>> Ok, thanks for the insight. In which case, I'll go with what we discussed:
>>>>>>>> require admission control to be disabled for sched_setattr() but allow
>>>>>>>> execve() to a 32-bit task from a 64-bit deadline task with a warning (this
>>>>>>>> is probably similar to CPU hotplug?).
>>>>>>>
>>>>>>> Still not sure that we can let execve go through ... It will break AC
>>>>>>> all the same, so it should probably fail as well if AC is on IMO
>>>>>>>
>>>>>>
>>>>>> If the cpumask of the 32-bit task is != of the 64-bit task that is executing it,
>>>>>> the admission control needs to be re-executed, and it could fail. So I see this
>>>>>> operation equivalent to sched_setaffinity(). This will likely be true for future
>>>>>> schedulers that will allow arbitrary affinities (AC should run on affinity
>>>>>> change, and could fail).
>>>>>>
>>>>>> I would vote with Juri: "I'd go with fail hard if AC is on, let it
>>>>>> pass if AC is off (supposedly the user knows what to do)," (also hope nobody
>>>>>> complains until we add better support for affinity, and use this as a motivation
>>>>>> to get back on this front).
>>>>>
>>>>> I can have a go at implementing it, but I don't think it's a great solution
>>>>> and here's why:
>>>>>
>>>>> Failing an execve() is _very_ likely to be fatal to the application. It's
>>>>> also very likely that the task calling execve() doesn't know whether the
>>>>> program it's trying to execute is 32-bit or not. Consequently, if we go
>>>>> with failing execve() then all that will happen is that people will disable
>>>>> admission control altogether.
>>>
>>> Right, but only on these dumb 32bit asymmetric systems, and only if we
>>> care about running 32bits deadline tasks -- which I seriously doubt for
>>> the Android use-case.
>>>
>>> Note that running deadline tasks is also a privileged operation, it
>>> can't be done by random apps.
>>>
>>>>> That has a negative impact on "pure" 64-bit
>>>>> applications and so I think we end up with the tail wagging the dog because
>>>>> admission control will be disabled for everybody just because there is a
>>>>> handful of 32-bit programs which may get executed. I understand that it
>>>>> also means that RT throttling would be disabled.
>>>>
>>>> Completely understand your perplexity. But how can the kernel still give
>>>> guarantees to "pure" 64-bit applications if there are 32-bit
>>>> applications around that essentially broke admission control when they
>>>> were restricted to a subset of cores?
>>>>
>>>>> Allowing the execve() to continue with a warning is very similar to the
>>>>> case in which all the 64-bit CPUs are hot-unplugged at the point of
>>>>> execve(), and this is much closer to the illusion that this patch series
>>>>> intends to provide.
>>>>
>>>> So, for hotplug we currently have a check that would make hotplug
>>>> operations fail if removing a CPU would mean not enough bandwidth to run
>>>> the currently admitted set of DEADLINE tasks.
>>>
>>> Aha, wasn't aware. Any pointers to that check for my education?
>>
>> Hotplug ends up calling dl_cpu_busy() (after the cpu being hotplugged out
>> got removed), IIRC. So, if that fails the operation in undone.
>
> Interesting, thanks. Thinking about this some more, it strikes me that with
> these silly asymmetric systems there could be an interesting additional
> problem with hotplug and deadline tasks. Imagine the following sequence of
> events:
>
> 1. All online CPUs are 32-bit-capable
> 2. sched_setattr() admits a 32-bit deadline task
> 3. A 64-bit-only CPU is onlined
> 4. Some of the 32-bit-capable CPUs are offlined
>
> I wonder if we can get into a situation where we think we have enough
> bandwidth available, but in reality the 32-bit task is in trouble because
> it can't make use of the 64-bit-only CPU.
>
> If so, then it seems to me that admission control is really just
> "best-effort" for 32-bit deadline tasks on these systems because it's based
> on a snapshot in time of the available resources.
IMHO DL AC is per root domain (rd). So if we have e.g. an 8 CPU system
with aarch32_el0 eq. [0-3] then we would need 2 exclusive cpusets ([0-3]
and [4-7]) to admit 32-bit DL tasks into [0-3] (i.e. to pass the `if
(!cpumask_subset(span, p->cpus_ptr) ...` test in __sched_setscheduler().
Trying to admit too many 32-bit DL tasks or trying to hp out a CPU[0-3]
would lead to `Device or resource busy` in case the rd bw wouldn't be
sufficient anymore for the set of admitted tasks. But the [0-3] DL AC
wouldn't care about hp on CPU[4-7].
Powered by blists - more mailing lists