linux-kernel - Re: [PATCH] arm64: of: handle multiple threads in ARM cpu node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250113115849.00006fee@huawei.com>
Date: Mon, 13 Jan 2025 11:58:49 +0000
From: Alireza Sanaee <alireza.sanaee@...wei.com>
To: Mark Rutland <mark.rutland@....com>
CC: "devicetree@...r.kernel.org" <devicetree@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "robh@...nel.org" <robh@...nel.org>, Linuxarm
	<linuxarm@...wei.com>, Shameerali Kolothum Thodi
	<shameerali.kolothum.thodi@...wei.com>, Jonathan Cameron
	<jonathan.cameron@...wei.com>, jiangkunkun <jiangkunkun@...wei.com>,
	yangyicong <yangyicong@...wei.com>, "zhao1.liu@...el.com"
	<zhao1.liu@...el.com>
Subject: Re: [PATCH] arm64: of: handle multiple threads in ARM cpu node

On Fri, 10 Jan 2025 17:25:50 +0000
Mark Rutland <mark.rutland@....com> wrote:

Hi Mark,

Just resending, but without the screenshot mistakenly attached to
the other email. Sorry about that.

> On Fri, Jan 10, 2025 at 05:02:11PM +0000, Alireza Sanaee wrote:
> > On Fri, 10 Jan 2025 16:23:00 +0000
> > Mark Rutland <mark.rutland@....com> wrote:
> > 
> > Hi Mark,
> > 
> > Thanks for prompt feedback.
> > 
> > Please look inline.
> >   
> > > On Fri, Jan 10, 2025 at 04:10:57PM +0000, Alireza Sanaee wrote:  
> > > > Update `of_parse_and_init_cpus` to parse reg property of CPU
> > > > node as an array based as per spec for SMT threads.
> > > > 
> > > > Spec v0.4 Section 3.8.1:    
> > > 
> > > Which spec, and why do we care?  
> > 
> > For the spec, this is what I looked
> > into https://github.com/devicetree-org/devicetree-specification/releases/download/v0.4/devicetree-specification-v0.4.pdf 
> > Section 3.8.1 
> > 
> > Sorry I didn't put the link in there.  
> 
> Ok, so that's "The devicetree specification v0.4 from ${URL}", rather
> than "Spec v0.4".

:) sure, I will be more precise in my future correspondences.

> 
> > One limitation with the existing approach is that it is not really
> > possible to describe shared caches for SMT cores as they will be
> > seen as separate CPU cores in the device tree. Is there anyway to
> > do so?  
> 
> Can't the existing cache bindings handle that? e.g. give both threads
> a next-level-cache pointing to the shared L1?

Unfortunately, I have tested this recently, there are some leg work to
be able to even enable that, and does not work right now.

> 
> > More discussion over sharing caches for threads
> > here https://lore.kernel.org/kvm/20241219083237.265419-1-zhao1.liu@intel.com/  
> 
> In that thread Rob refers to earlier discussions, so I don't think
> that thread alone has enough context.

https://lore.kernel.org/linux-devicetree/CAL_JsqLGEvGBQ0W_B6+5cME1UEhuKXadBB-6=GoN1tmavw9K_w@mail.gmail.com/

This was the earlier discussion, where Rob pointed me towards
investigating this approach (this patch).

> 
> > > > The value of reg is a <prop-encoded-**array**> that defines a
> > > > unique CPU/thread id for the CPU/threads represented by the CPU
> > > > node. **If a CPU supports more than one thread (i.e.  multiple
> > > > streams of execution) the reg property is an array with 1
> > > > element per thread**.  The address-cells on the /cpus node
> > > > specifies how many cells each element of the array takes.
> > > > Software can determine the number of threads by dividing the
> > > > size of reg by the parent node's address-cells.    
> > > 
> > > We already have systems where each thread gets a unique CPU node
> > > under /cpus, so we can't rely on this to determine the topology.  
> > 
> > I assume we can generate unique values even in reg array, but
> > probably makes things more complicated.  
> 
> The other bindings use phandles to refer to threads, and phandles
> point to nodes in the dt, so it's necessary for threads to be given
> separate nodes.
> 
> Note that the CPU topology bindings use that to describe threads, see
> 
>   Documentation/devicetree/bindings/cpu/cpu-topology.txt

Noted. Makes sense.

> 
> > > Further, there are bindings which rely on being able to address
> > > each CPU/thread with a unique phandle (e.g. for affinity of PMU
> > > interrupts), which this would break.  
> 
> > > Regardless, as above I do not think this is a good idea. While it
> > > allows the DT to be written in a marginally simpler way, it makes
> > > things more complicated for the kernel and is incompatible with
> > > bindings that we already support.
> > > 
> > > If anything "the spec" should be relaxed here.  
> > 
> > Hi Rob,
> > 
> > If this approach is too disruptive, then shall we fallback to the
> > approach where go share L1 at next-level-cache entry?  
> 
> Ah, was that previously discussed, and were there any concerns against
> that approach?
> 
> To be clear, my main concern here is that threads remain represented
> as distinct nodes under /cpus; I'm not wedded to the precise solution
> for representing shared caches.

This was basically what comes to mind as a
non-invasive preliminary solution. That said
there were no discussions over downsides or advantages of having a
separate layer for l1-cache YET. 

But if it is something reasonable, I can look into it.  

> 
> Mark.
> 
> 

Thanks,
Alireza