[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDeT0-pX7_NVr-bG_cqYUCCogYbR0ioMT-zjyXsDO45fA@mail.gmail.com>
Date: Thu, 17 Aug 2023 18:37:43 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Swapnil Sapkal <Swapnil.Sapkal@....com>,
Aaron Lu <aaron.lu@...el.com>, x86@...nel.org
Subject: Re: [RFC PATCH 1/1] sched: ttwu_queue_cond: perform queued wakeups
across different L2 caches
On Thu, 17 Aug 2023 at 18:13, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> On 8/17/23 12:09, Mathieu Desnoyers wrote:
> > On 8/17/23 12:01, Vincent Guittot wrote:
> >> On Thu, 17 Aug 2023 at 17:34, Mathieu Desnoyers
> >> <mathieu.desnoyers@...icios.com> wrote:
> >>>
> >>> Skipping queued wakeups for all logical CPUs sharing an LLC means that
> >>> on a 192 cores AMD EPYC 9654 96-Core Processor (over 2 sockets), groups
> >>> of 8 cores (16 hardware threads) end up grabbing runqueue locks of other
> >>> runqueues within the same group for each wakeup, causing contention on
> >>> the runqueue locks.
> > [...]
> >>>
> >>> -bool cpus_share_cache(int this_cpu, int that_cpu);
> >>> +bool cpus_share_cluster(int this_cpu, int that_cpu); /* Share L2. */
> >>> +bool cpus_share_cache(int this_cpu, int that_cpu); /* Share LLC. */
> >>
> >> I think that Yicong is doing what you want with
> >> cpus_share_lowest_cache() which points to cluster when available or
> >> LLC otherwise
> >> https://lore.kernel.org/lkml/20220720081150.22167-1-yangyicong@hisilicon.com/t/#m0ab9fa0fe0c3779b9bbadcfbc1b643dce7cb7618
> >>
> >
> > AFAIU (please correct me if I'm wrong) my AMD EPYC machine has sockets
> > consisting of 12 clusters, each cluster having its own L3 cache.
> >
> > What I am trying to achieve here is really to implement "cpus_share_l2":
> > I want this to match only when the cpus have a common L2 cache. L3
> > appears to be a group which is either:
> >
> > - too large (16 hw threads) or
> > - have a too high access latency.
> >
> > I'm not certain which (or if both) of those reasons explain why
> > grouping by L2 is better here.
>
> Re-reading the patch you pointed me to, I notice:
>
> "+ * Whether CPUs are share lowest cache, which means LLC on non-cluster
> + * machines and LLC tag or L2 on machines with clusters."
>
> So this "share lowest cache" really means lowest in terms of number,
> e.g. L2 < L3, and not "lowest in the hierarchy" as is "closest to
> memory", correct ?
Yes
>
> Thanks,
>
> Mathieu
>
> >
> > Thanks,
> >
> > Mathieu
> >
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
Powered by blists - more mailing lists