linux-kernel - Re: [RFC PATCH 0/4] Scheduler time slice extension

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <71164940-E45A-4572-9F8D-4CE7189514E4@oracle.com>
Date: Fri, 15 Nov 2024 17:49:29 +0000
From: Prakash Sangappa <prakash.sangappa@...cle.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
CC: Peter Zijlstra <peterz@...radead.org>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        Daniel Jordan
	<daniel.m.jordan@...cle.com>
Subject: Re: [RFC PATCH 0/4] Scheduler time slice extension



> On Nov 15, 2024, at 6:41 AM, Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> 
> On 2024-11-14 05:14, Peter Zijlstra wrote:
>> On Wed, Nov 13, 2024 at 02:36:58PM -0500, Mathieu Desnoyers wrote:
>>> On 2024-11-13 13:50, Peter Zijlstra wrote:
>>>> On Wed, Nov 13, 2024 at 12:01:22AM +0000, Prakash Sangappa wrote:
>>>> 
>>>>> This patch set implements the above mentioned 50us extension time as posted
>>>>> by Peter. But instead of using restartable sequences as API to set the flag
>>>>> to request the extension, this patch proposes a new API with use of a per
>>>>> thread shared structure implementation described below. This shared structure
>>>>> is accessible in both users pace and kernel. The user thread will set the
>>>>> flag in this shared structure to request execution time extension.
>>>> 
>>>> But why -- we already have rseq, glibc uses it by default. Why add yet
>>>> another thing?
>>> 
>>> Indeed, what I'm not seeing in this RFC patch series cover letter is an
>>> explanation that justifies adding yet another per-thread memory area
>>> shared between kernel and userspace when we have extensible rseq
>>> already.
>>> 
>>> Peter, was there anything fundamentally wrong with your approach based
>>> on rseq ? https://lore.kernel.org/lkml/20231030132949.GA38123@noisy.programming.kicks-ass.net
>> Not that I can remember, but it's a long time ago :-)
>>> The main thing I wonder is whether loading the rseq delay resched flag
>>> on return to userspace is too late in your patch.
>> Too late how? It only loads it at the point we would've called
>> schedule() -- no point in looking at it otherwise, right?
> 
> [...]
> 
> For the specific return-to-userspace path, I think where you've placed
> the delay-resched flag check is fine.
> 
> I'm concerned about other code paths that invoke schedule() besides
> return-to-userspace. For instance:
> 
> raw_irqentry_exit_cond_resched():
> 
>        if (!preempt_count()) {
> [...]
>                if (need_resched())
>                        preempt_schedule_irq();
>        }
> 
> AFAIU, this could be triggered by an interrupt handler exit when nested
> over a page fault handler, exception handler, or system call.
> 
> We may decide that we cannot care less about those scenarios, and just
> ignore the delay-resched flag, but it's relevant to take those into
> consideration and clearly document the rationale behind our decision.

Don’t think the delay-resched will address all scenarios where preemption can 
occur when in critical section.  We could aim to address frequent paths where
a task can get preempted.  Initially the intent was to prevent preemption mainly
at the end of time slice, if the thread is in a critical section in the user space and 
has requested delaying reschedule.. 

Another path to consider is the wakeups occurring on a different cpu which could 
enqueue  a thread and attempt to preempt this thread when it is running in the
critical section. Should it check if the thread running has been granted extra time 
i.e the ‘taskshrd_sched_delay’ has been set for the running thread, avoid setting 
TIF_NEED_RESCHED in resched_curr() and sending IPI, ie like lazy preemption? 
If ’taskshrd_sched_delay’ has been set we know it will get preempted  due to the 
timer soon, or the task would sched_yield(if it is a well behaving application).

> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>