linux-kernel - Re: [PATCH 7/9] sched: Add migrate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v9g4ao8h.fsf@nanos.tec.linutronix.de>
Date:   Wed, 23 Sep 2020 10:31:10 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Daniel Bristot de Oliveira <bristot@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>, mingo@...nel.org
Cc:     linux-kernel@...r.kernel.org, bigeasy@...utronix.de,
        qais.yousef@....com, swood@...hat.com, valentin.schneider@....com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, vincent.donnefort@....com
Subject: Re: [PATCH 7/9] sched: Add migrate_disable()

On Mon, Sep 21 2020 at 22:42, Daniel Bristot de Oliveira wrote:
> On 9/21/20 9:16 PM, Thomas Gleixner wrote:
>> On Mon, Sep 21 2020 at 18:36, Peter Zijlstra wrote:
>> But seriously, I completely understand your concern vs. schedulability
>> theories, but those theories can neither deal well with preemption
>> disable simply because you can create other trainwrecks when enough low
>> priority tasks run long enough in preempt disabled regions in
>> parallel. The scheduler simply does not know ahead how long these
>> sections will take and how many of them will run in parallel.
>> 
>> The theories make some assumptions about preempt disable and consider it
>> as temporary priority ceiling, but that's all assumptions as the bounds
>> of these operations simply unknown.
>
> Limited preemption is something that is more explored/well known than
> limited/arbitrary affinity - I even know a dude that convinced academics about
> the effects/properties of preempt disable on the PREEMPT_RT!

I'm sure I never met that guy.

> But I think that the message here is that: ok, migrate disable is better for the
> "scheduling latency" than preempt disable (preempt rt goal). But the
> indiscriminate usage of migrate disable has some undesired effects for "response
> time" of real-time threads (scheduler goal), so we should use it with caution -
> as much as we have with preempt disable. In the end, both are critical for
> real-time workloads, and we need more work and analysis on them both.
...
>> But as the kmap discussion has shown, the current situation of enforcing
>> preempt disable even on a !RT kernel is not pretty either. I looked at
>> quite some of the kmap_atomic() usage sites and the resulting
>> workarounds for non-preemptability are pretty horrible especially if
>> they do copy_from/to_user() or such in those regions. There is tons of
>> other code which really only requires migrate disable
>
> (not having an explicit declaration of the reason to disable preemption make
> these all hard to rework... and we will have the same with migrate disable.
> Anyways, I agree that disabling only migration helps -rt now [and I like
> that]... but I also fear/care for scheduler metrics on the long term... well,
> there is still a long way until retirement.)

Lets have a look at theory and practice once more:

1) Preempt disable

   Theories take that into account by adding a SHC ('Sh*t Happens
   Coefficient') into their formulas, but the practical effects cannot
   ever be reflected in theories accurately.

   In practice, preempt disable can cause unbound latencies and while we
   all agree that long preempt/interrupt disabled sections are bad, it's
   not really trivial to break these up without rewriting stuff from
   scratch. The recent discussion about unbound latencies in the page
   allocator is a prime example for that.

   The ever growing usage of per CPU storage is not making anything
   better and right now preempt disable is the only tool we have at the
   moment in mainline to deal with that.

   That forces people to come up with code constructs which are more
   than suboptimal both in terms of code quality and in terms of
   schedulability/latency. We've seen mutexes converted to spinlocks
   just because of that, conditionals depending on execution context
   which turns out to be broken and inconsistent, massive error handling
   trainwrecks, etc.

2) Migrate disable

   Theories do not know anything about it, but in the very end it's
   going to be yet another variant of SHC to be defined.

   In practice migrate disable could be taken into account on placement
   decisions, but yes we don't have anything like that at the moment.

   The theoretical worst case which forces all and everything on a
   single CPU is an understandable concern, but the practical relevance
   is questionable. I surely stared at a lot of traces on heavily loaded
   RT systems, but too many prempted migrate disabled tasks was truly
   never a practical problem. I'm sure you can create a workload
   scenario which triggers that, but then you always can create
   workloads which are running into the corner cases of any given
   system.

   The charm of migrate disable even on !RT is that it allows for
   simpler code and breaking up preempt disabled sections, which is IMO
   a clear win given that per CPU ness is not going away -unless the
   chip industry comes to senses and goes back to the good old UP
   systems which have natural per CPU ness :)

That said, let me paraphrase that dude you mentioned above:

 Theories are great and useful, but pragmatism has proven to produce
 working solutions even if they cannot work according to theory.

Thanks,

        tglx