[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250425065357.eiwSSvff@linutronix.de>
Date: Fri, 25 Apr 2025 08:53:57 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Prakash Sangappa <prakash.sangappa@...cle.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>,
"tglx@...utronix.de" <tglx@...utronix.de>
Subject: Re: [PATCH V2 1/3] Sched: Scheduler time slice extension
On 2025-04-25 01:19:07 [+0000], Prakash Sangappa wrote:
> > On Apr 24, 2025, at 7:13 AM, Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> >
> > On 2025-04-18 19:34:08 [+0000], Prakash Sangappa wrote:
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> > …
> >> @@ -930,6 +931,9 @@ struct task_struct {
> >> struct plist_node pushable_tasks;
> >> struct rb_node pushable_dl_tasks;
> >> #endif
> >> +#ifdef CONFIG_RSEQ
> >> + unsigned rseq_sched_delay:1;
> >> +#endif
> >
> > There should be somewhere a bitfield already which you could use without
> > the ifdef. Then you could use IS_ENABLED() if you want to save some code
> > if RSEQ is not enabled.
>
> I suppose we could.
> Patch 1 is pretty much what PeterZ posted, hope he will comment on it.
If it is "pretty much what PeterZ posted" why did he not receive any
credit for it?
> Could it be moved below here, call it sched_time_delay, or some variant of this name?
I don't mind the name. The point is to add to an existing group instead
of starting a new u32 bit field.
> struct task_struct {
> ..
> #ifdef CONFIG_TASK_DELAY_ACCT
> /* delay due to memory thrashing */
> unsigned in_thrashing:1;
> #endif
> unsigned sched_time_delay:1;
> ..
> }
>
> This field will be for scheduler time extension use only. Mainly updated in the context of the current thread.
> Do we even need to use IS_ENABLED(CONFIG_RSEQ) to access?
Well, if you want to avoid the code in the !CONFIG_RSEQ then yes.
> >> struct mm_struct *mm;
> >> struct mm_struct *active_mm;
> >> --- a/include/uapi/linux/rseq.h
> >> +++ b/include/uapi/linux/rseq.h
> > …
> >> @@ -128,6 +131,8 @@ struct rseq {
> >> * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
> >> * Inhibit instruction sequence block restart on migration for
> >> * this thread.
> >> + * - RSEQ_CS_DELAY_RESCHED
> >> + * Try delay resched...
> >
> > Delay resched up to $time for $kind-of-stats under $conditions.
>
> Will add some comment like
> “Delay resched for upto 50us. Checked when thread is about to be preempted"
>
> With the tunable added in the subsequent patch, will change ‘50us' it to the tunable name.
A proper descritption of the flag would nice. The current state is that
I can derive move from the constant than from the comment.
> >> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> >> index 6b7ff1bc1b9b..944027d14198 100644
> >> --- a/kernel/entry/common.c
> >> +++ b/kernel/entry/common.c
> > …
> >> @@ -99,8 +100,12 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> >>
> >> local_irq_enable_exit_to_user(ti_work);
> >>
> >> - if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY))
> >> - schedule();
> >> + if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
> >
> > couldn't we restrict this to _TIF_NEED_RESCHED_LAZY? That way we would
> > still schedule immediately for any SCHED_FIFO/RR/DL tasks and do this
> > delay only for everything else such as SCHED_OTHER/…
>
>
> Wasn’t this the entire discussion about whether to limit it to SCHE_OTHER or not?
> Will defer it to Peter.
Oh. But this still deserves a trace point for this manoeuvre. A trace
would show you a wakeup, the need-resched bit will be shown and then it
will vanish later and people might wonder where did it go.
> >
> >> + if (irq && rseq_delay_resched())
> >> + clear_tsk_need_resched(current);
> >> + else
> >> + schedule();
> >> + }
> >>
> >> if (ti_work & _TIF_UPROBE)
> >> uprobe_notify_resume(regs);
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 165c90ba64ea..cee50e139723 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -823,6 +823,7 @@ void update_rq_clock(struct rq *rq)
> >>
> >> static void hrtick_clear(struct rq *rq)
> >> {
> >> + rseq_delay_resched_tick();
> >
> > This is called from __schedule(). If you set the need-resched flag here,
> > it gets removed shortly after. Do I miss something?
>
> hrtick_clear() is also called when the cpu is being removed in sched_cpu_dying().
> We need to set resched there?
Do we? My understanding is that the NEED_RESCHED flag gets removed once
and then RSEQ_CS_DELAY_RESCHED gets set. RSEQ_CS_DELAY_RESCHED in turn
gets cleared in the scheduler once task leaves the CPU. Once the task
left the CPU then there is no need to set the bit. The sched_cpu_dying()
is the HP thread so if that one is on the CPU then the user task is
gone.
How does this delay thingy work with HZ=100 vs HZ=1000? Like what is the
most you could get in extra time? I could imagine that if a second task
gets on the runqueue and you skip the wake up but the runtime is used up
then the HZ tick should set NEED_RESCHED again and the following HZ tick
should force the schedule point.
> Thanks for your comments.
> -Prakash.
Sebastian
Powered by blists - more mailing lists