linux-kernel - [PATCH] arch_topology: support parsing cache topology from DT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SL2PR06MB30825F36A5963C6A05A39F9DBDE99@SL2PR06MB3082.apcprd06.prod.outlook.com>
Date:   Fri, 8 Apr 2022 02:07:27 +0000
From:   王擎 <wangqing@...o.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>,
        Sudeep Holla <sudeep.holla@....com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     wangqing <11112896@...tel.com>
Subject: [PATCH] arch_topology: support parsing cache topology from DT


>
>[...]
>
>> +void init_cpu_cache_topology(void)
>> +{
>> +     struct device_node *node_cpu, *node_cache;
>> +     int cpu;
>> +     int level = 0;
>> +
>> +     for_each_possible_cpu(cpu) {
>> +             node_cpu = of_get_cpu_node(cpu, NULL);
>> +             if (!node_cpu)
>> +                     continue;
>> +
>> +             level = 0;
>> +             node_cache = node_cpu;
>> +             while (level < MAX_CACHE_LEVEL) {
>> +                     node_cache = of_parse_phandle(node_cache, "next-level-cache", 0);
>> +                     if (!node_cache)
>> +                             break;
>> +
>> +                     cache_topology[cpu][level++] = node_cache;
>> +             }
>> +             of_node_put(node_cpu);
>> +     }
>> +}
>
>From where is init_cpu_cache_topology() called?

All arch which support GENERIC_ARCH_TOPOLOGY can be called, 
I will take arm64 as an example in V2.

>
>> +bool cpu_share_llc(int cpu1, int cpu2)
>> +{
>> +     int cache_level;
>> +
>> +     for (cache_level = MAX_CACHE_LEVEL - 1; cache_level > 0; cache_level--) {
>> +             if (!cache_topology[cpu1][cache_level])
>> +                     continue;
>> +
>> +             if (cache_topology[cpu1][cache_level] == cache_topology[cpu2][cache_level])
>> +                     return true;
>> +
>> +             return false;
>> +     }
>> +
>> +     return false;
>> +}
>
>Like I mentioned in:
>
>https://lkml.kernel.org/r/73b491fe-b5e8-ebca-081e-fa339cc903e1@arm.com
>
>the correct setting in DT's cpu-map node (only core nodes in your case
>(One DynamIQ cluster) will give you the correct LLC (highest
>SD_SHARE_PKG_RESOURCES) setting.

Thanks, I have seen your reply, but not 100% agree.

1.A cluster not only considers the cache, but also a set of the same arch,capability,
and policy. So we can't simply change its cluster(little, medium, big) even if DSU.
2.The cpu-map node can't give the correct LLC, it just give the correct cluster/core group

>
>https://www.kernel.org/doc/Documentation/devicetree/bindings/arm/topology.txt
>
>> +
>> +bool cpu_share_l2c(int cpu1, int cpu2)
>> +{
>> +     if (!cache_topology[cpu1][0])
>> +             return false;
>> +
>> +     if (cache_topology[cpu1][0] == cache_topology[cpu2][0])
>> +             return true;
>> +
>> +     return false;
>> +}
>> +
>>  /*
>>   * cpu topology table
>>   */
>> @@ -662,7 +720,7 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
>>                /* not numa in package, lets use the package siblings */
>>                core_mask = &cpu_topology[cpu].core_sibling;
>>        }
>> -     if (cpu_topology[cpu].llc_id != -1) {
>> +     if (cpu_topology[cpu].llc_id != -1 || cache_topology[cpu][0]) {
>>                if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
>>                        core_mask = &cpu_topology[cpu].llc_sibling;
>>        }
>> @@ -684,7 +742,8 @@ void update_siblings_masks(unsigned int cpuid)
>>        for_each_online_cpu(cpu) {
>>                cpu_topo = &cpu_topology[cpu];
>>  
>> -             if (cpuid_topo->llc_id == cpu_topo->llc_id) {
>> +             if ((cpuid_topo->llc_id != -1 && cpuid_topo->llc_id == cpu_topo->llc_id)
>> +                     || (cpuid_topo->llc_id == -1 && cpu_share_llc(cpu, cpuid))) {
>
>Assuming a:
>
>      .---------------.
>CPU   |0 1 2 3 4 5 6 7|
>      +---------------+
>uarch |l l l l m m m b| (so called tri-gear: little, medium, big)
>      +---------------+
>  L2  |   |   | | | | |
We only discuss the 4-core system here, but it should be:
   L2  |  |  | | | | | |
>      +---------------+
>  L3  |<--         -->|
>      +---------------+
>      |<-- cluster -->|
>      +---------------+
>      |<--   DSU   -->|
>      '---------------'
>
>system, I guess you would get (w/ Phantom SD and L2/L3 cache info in DT):
>
>CPU0 .. 3:
>
>MC         SD_SHARE_PKG_RESOURCES
>DIE     no SD_SHARE_PKG_RESOURCES
>
>CPU 4...7:
>
>DIE     no SD_SHARE_PKG_RESOURCES
>
>I can't see how this would make any sense ...
>
>Reason is cpu_share_llc(). You don't check cache_level=0 and w/
>
>CPU0 .. 3:
>cache_topology[CPUX][0] == L2
>cache_topology[CPUX][1] == L3
>
>CPU4...7:
>cache_topology[CPUX][0] == L3

No, I didn't describe the cache topology of cpu[4,7] in the commit log,
cause that's not the point, I use 4 cores(3 level cache) as an example.

CPU4...7 if needed:
cache_topology[CPUX][0] == L2
cache_topology[CPUX][1] == L3

cache_topology[CPUX][level], level is strictly corresponding to the cache level

>
>there is, except for CPU0-1 and CPU2-3, no LLC match.

All 4 CPUs match the LLC in this case.

Thanks,
Wang
>
>[...]