[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230711114207.GK3062772@hirez.programming.kicks-ass.net>
Date: Tue, 11 Jul 2023 13:42:07 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: David Vernet <void@...ifault.com>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
gautham.shenoy@....com, kprateek.nayak@....com, aaron.lu@...el.com,
clm@...a.com, tj@...nel.org, roman.gushchin@...ux.dev,
kernel-team@...a.com
Subject: Re: [PATCH v2 0/7] sched: Implement shared runqueue in CFS
On Mon, Jul 10, 2023 at 03:03:35PM -0500, David Vernet wrote:
> Difference between shared_runq and SIS_NODE
> ===========================================
>
> In [0] Peter proposed a patch that addresses Tejun's observations that
> when workqueues are targeted towards a specific LLC on his Zen2 machine
> with small CCXs, that there would be significant idle time due to
> select_idle_sibling() not considering anything outside of the current
> LLC.
>
> This patch (SIS_NODE) is essentially the complement to the proposal
> here. SID_NODE causes waking tasks to look for idle cores in neighboring
> LLCs on the same die, whereas shared_runq causes cores about to go idle
> to look for enqueued tasks. That said, in its current form, the two
> features at are a different scope as SIS_NODE searches for idle cores
> between LLCs, while shared_runq enqueues tasks within a single LLC.
>
> The patch was since removed in [1], and we compared the results to
> shared_runq (previously called "swqueue") in [2]. SIS_NODE did not
> outperform shared_runq on any of the benchmarks, so we elect to not
> compare against it again for this v2 patch set.
Right, so SIS is search-idle-on-wakeup, while you do
search-task-on-newidle, and they are indeed complentary actions.
As to SIS_NODE, I really want that to happen, but we need a little more
work for the Epyc things, they have a few too many CCXs per node :-)
Anyway, the same thing that moticated SIS_NODE should also be relevant
here, those Zen2 things have only 3/4 cores per LLC, would it not also
make sense to include multiple of them into the shared runqueue thing?
(My brain is still processing the shared_runq name...)
Powered by blists - more mailing lists