[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <q3krlyukweyfrabk2soxryx74mjl6yljqfm7nhfrhudbv47q4p@62unggrnbydk>
Date: Thu, 13 Nov 2025 09:04:38 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Valentin Schneider <vschneid@...hat.com>,
Chris Mason <clm@...a.com>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] sched/fair: Reimplement NEXT_BUDDY to align with
EEVDF goals
On Wed, Nov 12, 2025 at 03:48:23PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 12, 2025 at 12:25:21PM +0000, Mel Gorman wrote:
>
> > + /* Prefer picking wakee soon if appropriate. */
> > + if (sched_feat(NEXT_BUDDY) &&
> > + set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
> > +
> > + /*
> > + * Decide whether to obey WF_SYNC hint for a new buddy. Old
> > + * buddies are ignored as they may not be relevant to the
> > + * waker and less likely to be cache hot.
> > + */
> > + if (wake_flags & WF_SYNC)
> > + preempt_action = preempt_sync(rq, wake_flags, pse, se);
> > + }
>
> Why only do preempt_sync() when NEXT_BUDDY? Nothing there seems to
> depend on buddies.
There isn't a direct relation, but there is an indirect one. I know from
your previous review that you separated out the WF_SYNC but after a while,
I did not find a good reason to separate it completely from NEXT_BUDDY.
NEXT_BUDDY updates cfs_rq->next if appropriate to indicate there is a waker
relationship between two tasks and potentially share data that may still
be cache resident after a context switch. WF_SYNC indicates there may be
a strict relationship between those two tasks that the waker may need the
wakee to do some work before it can make progress. If NEXT_BUDDY does not
set cfs_rq->next in the current waking context then the wakee may only be
picked next by coincidence under normal EEVDF rules.
WF_SYNC could still reschedule if the wakee is not selected as a buddy but
the benefit, if any, would be marginal -- if the waker does not go to sleep
then WF_SYNC contract is violated and if the data becomes cache cold after
a wakeup delay then the shared data may already be evicted from cache.
With NEXT_BUDDY, there is a chance that the cost of a reschedule and/or
a context switch will be offset by reduced overall latency (e.g. fewer
cache misses). Without NEXT_BUDDY, WF_SYNC may only incur costs due to
context switching.
I considered the possibility of WF_SYNC being applied if pse is already a
buddy due to yield or some other factor but there is no reason to assume
any shared data is still cache resident and it's not easy to reason about. I
considered applying WF_SYNC if pse was already set and use it as a two-pass
filter but again, no obvious benefit or why the second wakeup ie more
important than the first wakeup. I considered WF_SYNC being applied if
any buddy is set but it's not clear why a SYNC wakeup between tasks A,B
should instead pick C to run ASAP outside of the normal EEVDF rules.
I think it's straight-forward if the logic is
o If NEXT_BUDDY sets the wakee becomes cfs_rq->next then
schedule the wakee soon
o If the wakee is to be selected soon and WF_SYNC is also set then
pick the wakee ASAP
but less straight-forward if
o If WF_SYNC is set, reschedule now and maybe the wakee will be
picked, maybe the waker will run again, maybe something else
will run and sometimes it'll be a gain overall.
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists