linux-kernel - Re: [RFC PATCH 1/1] sched: ttwu_queue_cond: perform queued wakeups across different L2 caches

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <81a0b993-b613-1b35-ba43-13c7306b50e6@efficios.com>
Date:   Thu, 17 Aug 2023 12:14:47 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Swapnil Sapkal <Swapnil.Sapkal@....com>,
        Aaron Lu <aaron.lu@...el.com>, x86@...nel.org
Subject: Re: [RFC PATCH 1/1] sched: ttwu_queue_cond: perform queued wakeups
 across different L2 caches

On 8/17/23 12:09, Mathieu Desnoyers wrote:
> On 8/17/23 12:01, Vincent Guittot wrote:
>> On Thu, 17 Aug 2023 at 17:34, Mathieu Desnoyers
>> <mathieu.desnoyers@...icios.com> wrote:
>>>
>>> Skipping queued wakeups for all logical CPUs sharing an LLC means that
>>> on a 192 cores AMD EPYC 9654 96-Core Processor (over 2 sockets), groups
>>> of 8 cores (16 hardware threads) end up grabbing runqueue locks of other
>>> runqueues within the same group for each wakeup, causing contention on
>>> the runqueue locks.
> [...]
>>>
>>> -bool cpus_share_cache(int this_cpu, int that_cpu);
>>> +bool cpus_share_cluster(int this_cpu, int that_cpu);   /* Share L2. */
>>> +bool cpus_share_cache(int this_cpu, int that_cpu);     /* Share LLC. */
>>
>> I think that Yicong is doing what you want with
>> cpus_share_lowest_cache() which points to cluster when available or
>> LLC otherwise
>> https://lore.kernel.org/lkml/20220720081150.22167-1-yangyicong@hisilicon.com/t/#m0ab9fa0fe0c3779b9bbadcfbc1b643dce7cb7618
>>
> 
> AFAIU (please correct me if I'm wrong) my AMD EPYC machine has sockets 
> consisting of 12 clusters, each cluster having its own L3 cache.
> 
> What I am trying to achieve here is really to implement "cpus_share_l2": 
> I want this to match only when the cpus have a common L2 cache. L3 
> appears to be a group which is either:
> 
> - too large (16 hw threads) or
> - have a too high access latency.
> 
> I'm not certain which (or if both) of those reasons explain why
> grouping by L2 is better here.

Re-reading the patch you pointed me to, I notice:

"+ * Whether CPUs are share lowest cache, which means LLC on non-cluster
  + * machines and LLC tag or L2 on machines with clusters."

So this "share lowest cache" really means lowest in terms of number, 
e.g. L2 < L3, and not "lowest in the hierarchy" as is "closest to 
memory", correct ?

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com