[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YgE3TrBrB0psljDk@BLR-5CG11610CF.amd.com>
Date: Mon, 7 Feb 2022 20:44:22 +0530
From: "Gautham R. Shenoy" <gautham.shenoy@....com>
To: Barry Song <21cnbao@...il.com>
Cc: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Yicong Yang <yangyicong@...ilicon.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>,
LAK <linux-arm-kernel@...ts.infradead.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
prime.zeng@...wei.com,
Jonathan Cameron <jonathan.cameron@...wei.com>,
ego@...ux.vnet.ibm.com, Linuxarm <linuxarm@...wei.com>,
Barry Song <song.bao.hua@...ilicon.com>,
Guodong Xu <guodong.xu@...aro.org>
Subject: Re: [PATCH v2 2/2] sched/fair: Scan cluster before scanning LLC in
wake-up path
On Fri, Feb 04, 2022 at 11:28:25PM +1300, Barry Song wrote:
> > We already figured out that there are no idle CPUs in this cluster. So dont
> > we gain performance by picking a idle CPU/core in the neighbouring cluster.
> > If there are no idle CPU/core in the neighbouring cluster, then it does make
> > sense to fallback on the current cluster.
>
> What you suggested is exactly the approach we have tried at the first beginning
> during debugging. but we didn't gain performance according to benchmark, we
> were actually losing. that is why we added this line to stop ping-pong:
> /* Don't ping-pong tasks in and out cluster frequently */
> if (cpus_share_resources(target, prev_cpu))
> return target;
>
> If we delete this, we are seeing a big loss of tbench while system
> load is medium
> and above.
Thanks for clarifying this Barry. Indeed, if the workload is sensitive
to data ping-ponging across L2 clusters, this heuristic makes sense. I
was thinking of workloads that require lower tail latency, in which
case exploring the larger LLC would have made more sense, assuming
that the larger LLC has an idle core/CPU.
In the absence of any hints from the workload, like something that
Peter had previous suggested
(https://lore.kernel.org/lkml/YVwnsrZWrnWHaoqN@hirez.programming.kicks-ass.net/),
optimizing for cache-access seems to be the right thing to do.
>
> Thanks
> Barry
--
Thanks and Regards
gautham.
Powered by blists - more mailing lists