linux-kernel - Re: [RFC PATCH 0/4] Scheduler time slice extension

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <896BA407-E19C-4CEB-BF5E-9707543BA365@oracle.com>
Date: Wed, 13 Nov 2024 20:10:52 +0000
From: Prakash Sangappa <prakash.sangappa@...cle.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
CC: Peter Zijlstra <peterz@...radead.org>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        Daniel Jordan
	<daniel.m.jordan@...cle.com>
Subject: Re: [RFC PATCH 0/4] Scheduler time slice extension



> On Nov 13, 2024, at 11:36 AM, Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> 
> On 2024-11-13 13:50, Peter Zijlstra wrote:
>> On Wed, Nov 13, 2024 at 12:01:22AM +0000, Prakash Sangappa wrote:
>>> This patch set implements the above mentioned 50us extension time as posted
>>> by Peter. But instead of using restartable sequences as API to set the flag
>>> to request the extension, this patch proposes a new API with use of a per
>>> thread shared structure implementation described below. This shared structure
>>> is accessible in both users pace and kernel. The user thread will set the
>>> flag in this shared structure to request execution time extension.
>> But why -- we already have rseq, glibc uses it by default. Why add yet
>> another thing?
> 
> Indeed, what I'm not seeing in this RFC patch series cover letter is an
> explanation that justifies adding yet another per-thread memory area
> shared between kernel and userspace when we have extensible rseq
> already.

It mainly provides pinned memory, can be useful for  future use cases where updating user memory in kernel context can be fast or needs to avoid pagefaults.

> 
> Peter, was there anything fundamentally wrong with your approach based
> on rseq ? https://lore.kernel.org/lkml/20231030132949.GA38123@noisy.programming.kicks-ass.net
> 
> The main thing I wonder is whether loading the rseq delay resched flag
> on return to userspace is too late in your patch. Also, I'm not sure it is
> realistic to require that no system calls should be done within time extension
> slice. If we have this scenario:

I am also not sure if we need to prevent system calls in this scenario. 
Was that restriction mainly because of restartable sequence API implements it?

-Prakash

> 
> A) userspace grabs lock
>   - set rseq delay resched flag
> B) syscall
>   - reschedule
>    [...]
>   - return to userspace, load rseq delay-resched flag from userspace (too late)
> 
> I would have thought loading the delay resched flag should be attempted much
> earlier in the scheduler code. Perhaps we could do this from a page fault
> disable critical section, and accept that this hint may be a no-op if the
> rseq page happens to be swapped out (which is really unlikely). This is
> similar to the "on_cpu" sched state rseq extension RFC I posted a while back,
> which needed to be accessed from the scheduler:
> 
>  https://lore.kernel.org/lkml/20230517152654.7193-1-mathieu.desnoyers@efficios.com/
>  https://lore.kernel.org/lkml/20230529191416.53955-1-mathieu.desnoyers@efficios.com/
> 
> And we'd leave the delay-resched load in place on return to userspace, so
> in the unlikely scenario where it is swapped out, at least it gets paged
> back at that point.
> 
> Feel free to let me know if I'm missing an important point and/or saying
> nonsense here.
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>