linux-kernel - Re: [PATCH 7/9] sched: Add migrate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <75f402b9-5819-c070-111c-fcf37ca90d31@redhat.com>
Date:   Wed, 23 Sep 2020 12:51:22 +0200
From:   Daniel Bristot de Oliveira <bristot@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>, mingo@...nel.org
Cc:     linux-kernel@...r.kernel.org, bigeasy@...utronix.de,
        qais.yousef@....com, swood@...hat.com, valentin.schneider@....com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, vincent.donnefort@....com
Subject: Re: [PATCH 7/9] sched: Add migrate_disable()

On 9/23/20 10:31 AM, Thomas Gleixner wrote:
> On Mon, Sep 21 2020 at 22:42, Daniel Bristot de Oliveira wrote:
>> On 9/21/20 9:16 PM, Thomas Gleixner wrote:
>>> On Mon, Sep 21 2020 at 18:36, Peter Zijlstra wrote:
>>> But seriously, I completely understand your concern vs. schedulability
>>> theories, but those theories can neither deal well with preemption
>>> disable simply because you can create other trainwrecks when enough low
>>> priority tasks run long enough in preempt disabled regions in
>>> parallel. The scheduler simply does not know ahead how long these
>>> sections will take and how many of them will run in parallel.
>>>
>>> The theories make some assumptions about preempt disable and consider it
>>> as temporary priority ceiling, but that's all assumptions as the bounds
>>> of these operations simply unknown.
>>
>> Limited preemption is something that is more explored/well known than
>> limited/arbitrary affinity - I even know a dude that convinced academics about
>> the effects/properties of preempt disable on the PREEMPT_RT!
> 
> I'm sure I never met that guy.

It is a funny Italian/Brazilian dude....

>> But I think that the message here is that: ok, migrate disable is better for the
>> "scheduling latency" than preempt disable (preempt rt goal). But the
>> indiscriminate usage of migrate disable has some undesired effects for "response
>> time" of real-time threads (scheduler goal), so we should use it with caution -
>> as much as we have with preempt disable. In the end, both are critical for
>> real-time workloads, and we need more work and analysis on them both.
> ...
>>> But as the kmap discussion has shown, the current situation of enforcing
>>> preempt disable even on a !RT kernel is not pretty either. I looked at
>>> quite some of the kmap_atomic() usage sites and the resulting
>>> workarounds for non-preemptability are pretty horrible especially if
>>> they do copy_from/to_user() or such in those regions. There is tons of
>>> other code which really only requires migrate disable
>>
>> (not having an explicit declaration of the reason to disable preemption make
>> these all hard to rework... and we will have the same with migrate disable.
>> Anyways, I agree that disabling only migration helps -rt now [and I like
>> that]... but I also fear/care for scheduler metrics on the long term... well,
>> there is still a long way until retirement.)
> 
> Lets have a look at theory and practice once more:
> 
> 1) Preempt disable
> 
>    Theories take that into account by adding a SHC ('Sh*t Happens
>    Coefficient') into their formulas, but the practical effects cannot
>    ever be reflected in theories accurately.

It depends, an adequate theory will have the correct balance between SHC and
precision. The idea is to precisely define the behavior as much as possible,
trying to reduce the SHC.

>    In practice, preempt disable can cause unbound latencies and while we
>    all agree that long preempt/interrupt disabled sections are bad, it's
>    not really trivial to break these up without rewriting stuff from
>    scratch. The recent discussion about unbound latencies in the page
>    allocator is a prime example for that.

Here we need to separate two contexts: the synchronization and the code contexts.

At the synchronization level the preempt_disable() is bounded [1]:

The PREEMPT_RT can have only *1* preempt_disable()/enable() not to schedule
section (the worst (SHC)) before starting the process of calling the scheduler.
So the SHC factor is then reduced to the code context [2].

The SHC is in the code context, and it is mitigated by the constant monitoring
of the code sections via tests.

>    The ever growing usage of per CPU storage is not making anything
>    better and right now preempt disable is the only tool we have at the
>    moment in mainline to deal with that.
> 
>    That forces people to come up with code constructs which are more
>    than suboptimal both in terms of code quality and in terms of
>    schedulability/latency. We've seen mutexes converted to spinlocks
>    just because of that, conditionals depending on execution context
>    which turns out to be broken and inconsistent, massive error handling
>    trainwrecks, etc.

I see and agree with you on this point.

> 2) Migrate disable
> 
>    Theories do not know anything about it, but in the very end it's
>    going to be yet another variant of SHC to be defined.

I agree. There are very few things known at this point. However, we can take as
exercise the example that Peter mentioned:

CPU 0					CPU 1
thread on migrate_disable():		high prio thread
 -> preempted!
    -> migrate_dsaible()
       -> preempted!			
		...
          migrate_disable()		leaves the CPU
	  SH happens			IDLE
		...			IDLE
	unfold all on this CPU.		IDLE (decades of ...)


So, at synchronization level... migrate_disable() is not bounded by a
constant as preempt_disable() does. That is the difference that worries peter,
and it is valid from both points of view (theoretical and practical).

>    In practice migrate disable could be taken into account on placement
>    decisions, but yes we don't have anything like that at the moment.

Agreed... we can mitigate that! that is a nice challenge!

>    The theoretical worst case which forces all and everything on a
>    single CPU is an understandable concern, but the practical relevance
>    is questionable. I surely stared at a lot of traces on heavily loaded
>    RT systems, but too many prempted migrate disabled tasks was truly
>    never a practical problem. I'm sure you can create a workload
>    scenario which triggers that, but then you always can create
>    workloads which are running into the corner cases of any given
>    system.

I see your point! and I agree it is a corner case. But... that is what we have
to deal with in RT scheduling, and such discussions are good for RT Linux, at
least to raise the attention to a point that might need to be constantly
monitored and not abused (like preempt_disable for RT).

>    The charm of migrate disable even on !RT is that it allows for
>    simpler code and breaking up preempt disabled sections, which is IMO
>    a clear win given that per CPU ness is not going away -unless the
>    chip industry comes to senses and goes back to the good old UP
>    systems which have natural per CPU ness :)
> 
> That said, let me paraphrase that dude you mentioned above:
> 
>  Theories are great and useful, but pragmatism has proven to produce
>  working solutions even if they cannot work according to theory.

I agree! The way that Linux advances as RTOS is shown to be effective. So
effective that people want more and more. And we can always show later that the
path that Linux took is right in theory! Like that dude did....

But at the same time, we can take advantage of scenarios we can predict being
hard for the future and try to mitigate/isolate them to avoid them. I can't
argue against that either.

At the end, it seems that the decision here is about which hard problem we will
have to deal in the future. [ As I said yesterday, I will have to deal with
migrate_disable anyway because of RT, so I do not have a way out anyway.]

Thanks!
-- Daniel

> Thanks,
> 
>         tglx
> 
>         
> 

For the wider audience
[1] https://bit.ly/33PsV0N
[2] there are other scenarios too, demonstrated in the paper, but here we are
discussing only about preempt_disable() postponing the __scheduler().