linux-kernel - Re: [PATCH v2 7/9] sched: define TIF_ALLOW

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h6nkh5bw.ffs@tglx>
Date:   Sun, 24 Sep 2023 00:50:43 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ankur Arora <ankur.a.arora@...cle.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
        akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        willy@...radead.org, mgorman@...e.de, rostedt@...dmis.org,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        jgross@...e.com, andrew.cooper3@...rix.com
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED

On Tue, Sep 19 2023 at 14:30, Thomas Gleixner wrote:
> On Mon, Sep 18 2023 at 18:57, Linus Torvalds wrote:
>> Then the question becomes whether we'd want to introduce a *new*
>> concept, which is a "if you are going to schedule, do it now rather
>> than later, because I'm taking a lock, and while it's a preemptible
>> lock, I'd rather not sleep while holding this resource".
>>
>> I suspect we want to avoid that for now, on the assumption that it's
>> hopefully not a problem in practice (the recently addressed problem
>> with might_sleep() was that it actively *moved* the scheduling point
>> to a bad place, not that scheduling could happen there, so instead of
>> optimizing scheduling, it actively pessimized it). But I thought I'd
>> mention it.
>
> I think we want to avoid that completely and if this becomes an issue,
> we rather be smart about it at the core level.
>
> It's trivial enough to have a per task counter which tells whether a
> preemtible lock is held (or about to be acquired) or not. Then the
> scheduler can take that hint into account and decide to grant a
> timeslice extension once in the expectation that the task leaves the
> lock held section soonish and either returns to user space or schedules
> out. It still can enforce it later on.
>
> We really want to let the scheduler decide and rather give it proper
> hints at the conceptual level instead of letting developers make random
> decisions which might work well for a particular use case and completely
> suck for the rest. I think we wasted enough time already on those.

Finally I realized why cond_resched() & et al. are so disgusting. They
are scope-less and just a random spot which someone decided to be a good
place to reschedule.

But in fact the really relevant measure is scope. Full preemption is
scope based:

      preempt_disable();
      do_stuff();
      preempt_enable();

which also nests properly:

      preempt_disable();
      do_stuff()
        preempt_disable();
        do_other_stuff();
        preempt_enable();
      preempt_enable();

cond_resched() cannot nest and is obviously scope-less.

The TIF_ALLOW_RESCHED mechanism, which sparked this discussion only
pretends to be scoped.

As Peter pointed out it does not properly nest with other mechanisms and
it cannot even nest in itself because it is boolean.

The worst thing about it is that it is semantically reverse to the
established model of preempt_disable()/enable(),
i.e. allow_resched()/disallow_resched().

So instead of giving the scheduler a hint about 'this might be a good
place to preempt', providing proper scope would make way more sense:

      preempt_lazy_disable();
      do_stuff();
      preempt_lazy_enable();

That would be the obvious and semantically consistent counterpart to the
existing preemption control primitives with proper nesting support.

might_sleep(), which is in all the lock acquire functions or your
variant of hint (resched better now before I take the lock) are the
wrong place.

     hint();
     lock();
     do_stuff();
     unlock();

hint() might schedule and when the task comes back schedule immediately
again because the lock is contended. hint() does again not have scope
and might be meaningless or even counterproductive if called in a deeper
callchain.

Proper scope based hints avoid that.

      preempt_lazy_disable();
      lock();
      do_stuff();
      unlock();
      preempt_lazy_enable();
      
That's way better because it describes the scope and the task will
either schedule out in lock() on contention or provide a sensible lazy
preemption point in preempt_lazy_enable(). It also nests properly:

      preempt_lazy_disable();
      lock(A);
      do_stuff()
        preempt_lazy_disable();
        lock(B);
        do_other_stuff();
        unlock(B);
        preempt_lazy_enable();
      unlock(A);
      preempt_lazy_enable();

So in this case it does not matter wheter do_stuff() is invoked from a
lock held section or not. The scope which defines the throughput
relevant hint to the scheduler is correct in any case.

Contrary to preempt_disable() the lazy variant does neither prevent
scheduling nor preemption, but its a understandable properly nestable
mechanism.

I seriously hope to avoid it alltogether :)

Thanks,

        tglx