[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee43f507-c0a2-45ed-818e-f24babf07d60@gmail.com>
Date: Sat, 15 Mar 2025 01:34:03 +0530
From: Sahil Siddiq <icegambit91@...il.com>
To: Stafford Horne <shorne@...il.com>
Cc: jonas@...thpole.se, stefan.kristiansson@...nalahti.fi,
sahilcdq@...ton.me, linux-openrisc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC] openrisc: Add cacheinfo support
Hi,
On 3/14/25 2:54 AM, Stafford Horne wrote:
> On Tue, Mar 11, 2025 at 12:43:57AM +0530, Sahil Siddiq wrote:
>> Add cacheinfo support for OpenRISC.
>>
>> [...]
>> None of the functions in drivers/base/cacheinfo.c that are capable of
>> pulling these details (e.g.: cache_size) have been used. This is because
>> they pull these details by reading properties present in the device tree
>> file. In setup.c, for example, the value of "clock-frequency" is pulled
>> from the device tree file.
>>
>> Cache related properties are currently not present in OpenRISC's device
>> tree files.
>
> If we want to add L2 caches and define them in the device tree would
> it "just work" or is more work needed?
>
A little more work will have to be done. The implementation of "init_cache_level"
and "populate_cache_leaves" will have to be extended. To pull L2 cache attributes,
they'll need to make calls to the "of_get_property" family of functions similar to
what's being done for RISC-V and PowerPC.
Shall I resubmit this patch with those extensions? I think I'll be able to test
those changes with a modified device tree file that has an L2 cache component.
>> Regarding the "shared_cpu_map" cache attribute, I wasn't able to find
>> anything in the OpenRISC architecture manual to indicate that processors
>> in a multi-processor system may share the same cache component. MIPS uses
>> "globalnumber" to detect siblings. LoongArch uses a "CACHE_PRIVATE" flag
>> to detect siblings sharing the same cache.
>
> In SMP environment the L1 caches are not shared they are specific to each CPU.
>
> Also, we do not have hyperthreading in OpenRISC so shared_cpu_map should be a
> 1-to-1 mapping with the cpu. Do you need to do extra work to setup that
> mapping?
>
No extra work has to be done to set up the 1-to-1 mapping. This is already being
done in "ci_leaf_init()".
>> I am running with the assumption that every OpenRISC core has its own
>> icache and dcache. Given that OpenRISC does not support a multi-level
>> cache architecture and that icache and dcache are like L1 caches, I
>> think this assumption is reasonable. What are your thoughts on this?
>
> Currently this is the case, but it could be possible to create an SoC with L2
> caches. I could imagine these would be outside of the CPU and we could define
> them with the device tree.
In this case, some extra work will have to be done to set the "shared_cpu_map"
appropriately. But I think the modifications will be quite small. If the L2 cache
is external to all CPUs, then all online CPUs will have their corresponding bit
set in the "shared_cpu_map".
>> Another issue I noticed is that the unit used in ...cache/indexN/size
>> is KB. The actual value of the size is right-shifted by 10 before being
>> reported. When testing these changes using QEMU (and without making any
>> modifications to the values stored in DCCFGR and ICCFGR), the cache size
>> is far smaller than 1KB. Consequently, this is reported as 0K. For cache
>> sizes smaller than 1KB, should something be done to report it in another
>> unit? Reporting 0K seems a little misleading.
>
> I think this is fine, as long as we pass in the correct size in bytes.
Understood.
>> [...]
>> +
>> +int init_cache_level(unsigned int cpu)
>> +{
>> + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()];
>> + struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>> + int leaves = 0, levels = 0;
>> + unsigned long upr = mfspr(SPR_UPR);
>> + unsigned long iccfgr, dccfgr;
>> +
>> + if (!(upr & SPR_UPR_UP)) {
>> + printk(KERN_INFO
>> + "-- no UPR register... unable to detect configuration\n");
>> + return -ENOENT;
>> + }
>> +
>> + if (upr & SPR_UPR_DCP) {
>> + dccfgr = mfspr(SPR_DCCFGR);
>> + cpuinfo->dcache.ways = 1 << (dccfgr & SPR_DCCFGR_NCW);
>> + cpuinfo->dcache.sets = 1 << ((dccfgr & SPR_DCCFGR_NCS) >> 3);
>> + cpuinfo->dcache.block_size = 16 << ((dccfgr & SPR_DCCFGR_CBS) >> 7);
>> + cpuinfo->dcache.size =
>> + cpuinfo->dcache.sets * cpuinfo->dcache.ways * cpuinfo->dcache.block_size;
>> + leaves += 1;
>> + printk(KERN_INFO
>> + "-- dcache: %4d bytes total, %2d bytes/line, %d way(s)\n",
>> + cpuinfo->dcache.size, cpuinfo->dcache.block_size,
>> + cpuinfo->dcache.ways);
>
> Can we print the number of sets here too? Also is there a reason to pad these
> int's with 4 and 2 spaces? I am not sure the padding is needed.
>
>> + } else
>> + printk(KERN_INFO "-- dcache disabled\n");
>> +
>> + if (upr & SPR_UPR_ICP) {
>> + iccfgr = mfspr(SPR_ICCFGR);
>> + cpuinfo->icache.ways = 1 << (iccfgr & SPR_ICCFGR_NCW);
>> + cpuinfo->icache.sets = 1 << ((iccfgr & SPR_ICCFGR_NCS) >> 3);
>> + cpuinfo->icache.block_size = 16 << ((iccfgr & SPR_ICCFGR_CBS) >> 7);
>> + cpuinfo->icache.size =
>> + cpuinfo->icache.sets * cpuinfo->icache.ways * cpuinfo->icache.block_size;
>> + leaves += 1;
>> + printk(KERN_INFO
>> + "-- icache: %4d bytes total, %2d bytes/line, %d way(s)\n",
>> + cpuinfo->icache.size, cpuinfo->icache.block_size,
>> + cpuinfo->icache.ways);
>
> Same here.
Sure, I'll print the number of sets as well.
I don't think there's any reason for the padding. It was part of the original
implementation in setup.c. There shouldn't be any issues in removing them.
>> [...]
>> seq_printf(m, "frequency\t\t: %ld\n", loops_per_jiffy * HZ);
>> - seq_printf(m, "dcache size\t\t: %d bytes\n", cpuinfo->dcache_size);
>> - seq_printf(m, "dcache block size\t: %d bytes\n",
>> - cpuinfo->dcache_block_size);
>> - seq_printf(m, "dcache ways\t\t: %d\n", cpuinfo->dcache_ways);
>> - seq_printf(m, "icache size\t\t: %d bytes\n", cpuinfo->icache_size);
>> - seq_printf(m, "icache block size\t: %d bytes\n",
>> - cpuinfo->icache_block_size);
>> - seq_printf(m, "icache ways\t\t: %d\n", cpuinfo->icache_ways);
>> seq_printf(m, "immu\t\t\t: %d entries, %lu ways\n",
>> 1 << ((mfspr(SPR_DMMUCFGR) & SPR_DMMUCFGR_NTS) >> 2),
>> 1 + (mfspr(SPR_DMMUCFGR) & SPR_DMMUCFGR_NTW));
>> --
>> 2.48.1
>>
>
> This pretty much looks ok to me.
>
Thank you for the review.
Thanks,
Sahil
Powered by blists - more mailing lists