lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 27 Oct 2023 12:35:56 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ankur Arora <ankur.a.arora@...cle.com>, linux-mm@...ck.org,
        x86@...nel.org, akpm@...ux-foundation.org, luto@...nel.org,
        bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
        mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        jgross@...e.com, andrew.cooper3@...rix.com,
        Joel Fernandes <joel@...lfernandes.org>,
        Youssef Esmat <youssefesmat@...omium.org>,
        Vineeth Pillai <vineethrp@...gle.com>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Ingo Molnar <mingo@...nel.org>,
        Daniel Bristot de Oliveira <bristot@...nel.org>
Subject: Re: [POC][RFC][PATCH v2] sched: Extended Scheduler Time Slice

On 2023-10-27 12:24, Steven Rostedt wrote:
> On Fri, 27 Oct 2023 12:09:56 -0400
> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> 
>>> I need to clear one bit while seeing if another bit is set. I could also
>>> use subl, as that would also atomically clear the bit.
>>
>> Ah ok, I did not get that you needed to test for a different bit than
>> the one you clear.
> 
> Yeah, maybe I'm not articulating the implementation as well.
> 
>    bit 0: Set by user space to tell the kernel it's in a critical section
> 
>    bit 1: Set by kernel that it gave user space extend time slice
> 
> Bit 1 will only be set by the kernel if bit 0 is set.
> 
> When entering a critical section, user space will set bit 0. When it leaves
> the critical section, it needs to clear bit 0, but also needs to handle the
> race condition from where it clears the bit and where the kernel could
> preempt it and set bit 1. Thus it needs an atomic operation to clear bit 0
> without affecting bit 1. Once bit 0 is cleared, it does not need to worry
> about bit 1 being set after that as the kernel will only set bit 1 if it
> sees that bit 0 was set. After user space clears bit 0, it must check bit 1
> to see if it should now schedule. And it's also up to user space to clear
> bit 1, but it can do that at any time with bit 0 cleared.
> 
>   extend() {
> 	cr_flags = 1;
>   }
> 
>   unextend() {
> 	cr_flags &= ~1;  /* Must be atomic */
> 	if (cr_flags & 2) {
> 		cr_flags = 0;  /* May not be necessary as it gets cleared by extend() */
> 		sched_yield();
> 	}
>   }
> 
> Does that make more sense?

Not really.

Please see my other email about the need for a reference count here, for
nested locks use-cases.

By "atomic" operation I suspect you only mean "single instruction" which can
alter the state of the field and keep its prior content in a register, not a
lock-prefixed atomic operation, right ?

The only reason why you have this asm trickiness is because both states
are placed into different bits from the same word, which is just an
optimization. You could achieve the same much more simply by splitting
this state in two different words, e.g.:

extend() {
   WRITE_ONCE(__rseq_abi->cr_nest, __rseq_abi->cr_nest + 1);
   barrier()
}

unextend() {
   barrier()
   WRITE_ONCE(__rseq_abi->cr_nest, __rseq_abi->cr_nest - 1);
   if (READ_ONCE(__rseq_abi->must_yield)) {
     WRITE_ONCE(__rseq_abi->must_yield, 0);
     sched_yield();
   }
}

Or am I missing something ?

Thanks,

Mathieu
   

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ