lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Sep 2023 10:26:06 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Ankur Arora <ankur.a.arora@...cle.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
        akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        willy@...radead.org, mgorman@...e.de, rostedt@...dmis.org,
        tglx@...utronix.de, jon.grimm@....com, bharata@....com,
        raghavendra.kt@....com, boris.ostrovsky@...cle.com,
        konrad.wilk@...cle.com, jgross@...e.com, andrew.cooper3@...rix.com
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED

On Mon, Sep 11, 2023 at 10:04:17AM -0700, Ankur Arora wrote:
> 
> Peter Zijlstra <peterz@...radead.org> writes:
> 
> > On Sun, Sep 10, 2023 at 11:32:32AM -0700, Linus Torvalds wrote:
> >
> >> I was hoping that we'd have some generic way to deal with this where
> >> we could just say "this thing is reschedulable", and get rid of - or
> >> at least not increasingly add to - the cond_resched() mess.
> >
> > Isn't that called PREEMPT=y ? That tracks precisely all the constraints
> > required to know when/if we can preempt.
> >
> > The whole voluntary preempt model is basically the traditional
> > co-operative preemption model and that fully relies on manual yields.
> 
> Yeah, but as Linus says, this means a lot of code is just full of
> cond_resched(). For instance a loop the process_huge_page() uses
> this pattern:
> 
>    for (...) {
>        cond_resched();
>        clear_page(i);
> 
>        cond_resched();
>        clear_page(j);
>    }

Yeah, that's what co-operative preemption gets you.

> > The problem with the REP prefix (and Xen hypercalls) is that
> > they're long running instructions and it becomes fundamentally
> > impossible to put a cond_resched() in.
> >
> >> Yes. I'm starting to think that that the only sane solution is to
> >> limit cases that can do this a lot, and the "instruciton pointer
> >> region" approach would certainly work.
> >
> > From a code locality / I-cache POV, I think a sorted list of
> > (non overlapping) ranges might be best.
> 
> Yeah, agreed. There are a few problems with doing that though.
> 
> I was thinking of using a check of this kind to schedule out when
> it is executing in this "reschedulable" section:
>         !preempt_count() && in_resched_function(regs->rip);
> 
> For preemption=full, this should mostly work.
> For preemption=voluntary, though this'll only work with out-of-line
> locks, not if the lock is inlined.
> 
> (Both, should have problems with __this_cpu_* and the like, but
> maybe we can handwave that away with sparse/objtool etc.)

So one thing we can do is combine the TIF_ALLOW_RESCHED with the ranges
thing, and then only search the range when TIF flag is set.

And I'm thinking it might be a good idea to have objtool validate the
range only contains simple instructions, the moment it contains control
flow I'm thinking it's too complicated.

> How expensive would be always having PREEMPT_COUNT=y?

Effectively I think that is true today. At the very least Debian and
SuSE (I can't find a RHEL .config in a hurry but I would think they too)
ship with PREEMPT_DYNAMIC=y.

Mel, I'm sure you ran numbers at the time (you always do), what if any
was the measured overhead from PREEMPT_DYNAMIC vs 'regular' voluntary
preemption?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ