[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1e348d1-6b74-9ac9-b7c6-6508a0fd9690@huawei.com>
Date: Sat, 2 Apr 2022 18:29:00 +0800
From: Yicong Yang <yangyicong@...wei.com>
To: 王擎 <wangqing@...o.com>,
Vincent Guittot <vincent.guittot@...aro.org>
CC: <yangyicong@...ilicon.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Sudeep Holla <sudeep.holla@....com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched: topology: make cache topology separate from cpu
topology
Hi Qing,
On 2022/4/2 17:34, 王擎 wrote:
>
>>>
>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Thu, 10 Mar 2022 at 13:59, Qing Wang <wangqing@...o.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> From: Wang Qing <wangqing@...o.com>
>>>>>>>>>>>
>>>>>>>>>>> Some architectures(e.g. ARM64), caches are implemented below:
>>>>>>>>>>> cluster: ****** cluster 0 ***** ****** cluster 1 *****
>>>>>>>>>>> core: 0 1 2 3 4 5 6 7
>>>>>>>>> (add cache level 1) c0 c1 c2 c3 c4 c5 c6 c7
>>>>>>>>>>> cache(Leveln): **cache0** **cache1** **cache2** **cache3**
>>>>>>>>> (add cache level 3) *************share level 3 cache ***************
>>>>>>>>>>> sd_llc_id(current): 0 0 0 0 4 4 4 4
>>>>>>>>>>> sd_llc_id(should be): 0 0 2 2 4 4 6 6
>>>>>>>>>>>
>>>>>>>>> Here, n always be 2 in ARM64, but others are also possible.
>>>>>>>>> core[0,1] form a complex(ARMV9), which share L2 cache, core[2,3] is the same.
>>>>>>>>>
>>>>>>>>>>> Caches and cpus have different topology, this causes cpus_share_cache()
>>>>>>>>>>> return the wrong value, which will affect the CPU load balance.
>>>>>>>>>>>
>>>>>>>>>> What does your current scheduler topology look like?
>>>>>>>>>>
>>>>>>>>>> For CPU 0 to 3, do you have the below ?
>>>>>>>>>> DIE [0 - 3] [4-7]
>>>>>>>>>> MC [0] [1] [2] [3]
>>>>>>>>>
>>>>>>>>> The current scheduler topology consistent with CPU topology:
>>>>>>>>> DIE [0-7]
>>>>>>>>> MC [0-3] [4-7] (SD_SHARE_PKG_RESOURCES)
>>>>>>>>> Most Android phones have this topology.
>>>>>>>>>>
>>>>>>>>>> But you would like something like below for cpu 0-1 instead ?
>>>>>>>>>> DIE [0 - 3] [4-7]
>>>>>>>>>> CLS [0 - 1] [2 - 3]
>>>>>>>>>> MC [0] [1]
>>>>>>>>>>
>>>>>>>>>> with SD_SHARE_PKG_RESOURCES only set to MC level ?
>>>>>>>>>
>>>>>>>>> We don't change the current scheduler topology, but the
>>>>>>>>> cache topology should be separated like below:
>>>>>>>>
>>>>>>>> The scheduler topology is not only cpu topology but a mixed of cpu and
>>>>>>>> cache/memory cache topology
>>>>>>>>
>>>>>>>>> [0-7] (shared level 3 cache )
>>>>>>>>> [0-1] [2-3][4-5][6-7] (shared level 2 cache )
>>>>>>>>
>>>>>>>> So you don't bother the intermediate cluster level which is even simpler.
>>>>>>>> you have to modify generic arch topology so that cpu_coregroup_mask
>>>>>>>> returns the correct cpu mask directly.
>>>>>>>>
>>>>>>>> You will notice a llc_sibling field that is currently used by acpi but
>>>>>>>> not DT to return llc cpu mask
>>>>>>>>
>>>>>>> cpu_topology[].llc_sibling describe the last level cache of whole system,
>>>>>>> not in the sched_domain.
>>>>>>>
>>>>>>> in the above cache topology, llc_sibling is 0xff([0-7]) , it describes
>>>>>>
>>>>>> If llc_sibling was 0xff([0-7] on your system, you would have only one level:
>>>>>> MC[0-7]
>>>>>
>>>>> Sorry, but I don't get it, why llc_sibling was 0xff([0-7] means MC[0-7]?
>>>>> In our system(Android), llc_sibling is indeed 0xff([0-7]) , because they
>>>>> shared the llc(L3), but we also have two level:
>>>>> DIE [0-7]
>>>>> MC [0-3][4-6]
>>>>> It makes sense, [0-3] are little cores, [4-7] are bit cores, se only up migrate
>>>>> when misfit. We won't change it.
>>>>>
>>>>>>
>>>>>>> the L3 cache sibling, but sd_llc_id describes the maximum shared cache
>>>>>>> in sd, which should be [0-1] instead of [0-3].
>>>>>>
>>>>>> sd_llc_id describes the last sched_domain with SD_SHARE_PKG_RESOURCES.
>>>>>> If you want llc to be [0-3] make sure that the
>>>>>> sched_domain_topology_level array returns the correct cpumask with
>>>>>> this flag
>>>>>
>>>>> Acturely, we want sd_llc to be [0-1] [2-3], but if the MC domain don't have
>>>>
>>>> sd_llc_id refers to a scheduler domain but your patch breaks this so
>>>> if you want a llc that reflects this topo: [0-1] [2-3] you must
>>>> provide a sched_domain level with this topo
>>>
>>> Maybe we should add a shared-cache level(SC), like what CLS does:
>>>
>>> DIE [0-7] (shared level 3 cache, SD_SHARE_PKG_RESOURCES)
>>> MC [0-3] [4-7] (not SD_SHARE_PKG_RESOURCES)
>>> CLS (if necessary)
>>> SC [0-1][2-3][4-5][6-7] (shared level 2 cache, SD_SHARE_PKG_RESOURCES)
>>> SMT (if necessary)
>>>
>>> SC means a couple of CPUs which are placed closely by sharing
>>> mid-level caches, but not enough to be a cluster.
>>
>> what you name SC above looks the same as CLS which should not be mixed
>> with Arm cluster terminology
>
> Do you mean cluster is equal to shared cache instead of containing, SC just
> means shared cache, but not form a cluster, a CLS can contain many SCs.
>
The cluster is a topology level above the CPUs but under LLC. On Kunpeng 920 the cpus
in a CLS will share L3T and on Intel's Jacobsville cpus in a CLS will share L2[1].
Seems you're using a DT based system. I think the parsing of cluster level is not
supported on DT yet so you cannot see it. Otherwise with right cpu topology reported
you will have a CLS level in which the cpus share L2 cache, just like Jacobsville.
[1] https://lore.kernel.org/all/20210924085104.44806-4-21cnbao@gmail.com/
> If as you said, SC looks the same as CLS, should we rename CLS to SC to
> avoid confusion?
>
> Thanks,
> Wang
>
Powered by blists - more mailing lists