linux-kernel - Re: Two questions about cache coherency on arm platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200323161537.ptjrihqotgmon7tr@mail.google.com>
Date:   Mon, 23 Mar 2020 16:15:40 +0000
From:   Changbin Du <changbin.du@...il.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Changbin Du <changbin.du@...il.com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: Two questions about cache coherency on arm platforms

Hi Mark,
Thanks for your answer. I still don't understand the first question.

On Mon, Mar 23, 2020 at 01:17:20PM +0000, Mark Rutland wrote:
> On Mon, Mar 23, 2020 at 08:35:26PM +0800, Changbin Du wrote:
> > Hi, All,
> > I am not very familiar with ARM processors. I have two questions about
> > cache coherency. Could anyone help me?
> > 
> > 1. How is cache coherency maintenanced on ARMv8 big.LITTLE system?
> > As far as I know, big cores and little cores are in seperate clusters on
> > big.LITTLE system.
> 
> This is often true, but not always the case. For example, with DSU big
> and little cores can be placed within the same cluster.
>
Yes, it is ture for DynamIQ that bl cores can be placed within the same cluster.
But I don't understand how linux support big.LITTLE before DynamIQ.

I read below description in ARM Cortex-A Series Programmer’s Guide for
ARMv8-A.
 | big.LITTLE software models require transparent and efficient transfer of data between big and LITTLE clusters.
 | Coherency between clusters is provided by a cache-coherent interconnect such as the ARM CoreLink CCI-400 described in Chapter 14.

So I think  big cores and little cores are in different clusters in this
case. Then we are not within the same Inner Shareable domain?

> > And cache coherence betwwen clusters requires the
> > memory regions are marked as 'Outer Shareable' and is very expensive.
> 
> This is not correct.
> 
> Linux requires that all cores it uses are within the same Inner
> Shareable domain, regardless of whether they are in distinct clusters.
> Linux does not support systems where cores are in distinct Inner
> Shareable domains.
>
I see. Thanks.

> This is the intended use of the architecture. Per ARM DDI 0487E.a page
> B2-144:
> 
> | This architecture assumes that all PEs that use the same operating
> | system or hypervisor are in the same Inner Shareable shareability
> | shareability
> 
> ... where a PE is a "Processing Element", which you can think of as a
> single core.
> 
> > I have checked the kernel code, and seems it only requires coherence in
> > 'Inner Shareable' domain. So my question is how can linux guarantees
> > cache coherence in 'CPU migration' or 'Global Task Scheduling' models
> > wich both clusters are active at the same time? For example, a thread
> > ran in Cluster A and modified 'Inner Shareable' memory, then it migrates
> > to Cluster B.
> 
> As above, this works because all the relevant cores are within the same
> Inner Shareable domain.
> 
> > 2. ARM64 cache maintenance code sync_icache_aliases() for non-aliasing icache.
> > In linux kernel on arm64 platform, the flow function sync_icache_aliases()
> > is used to sync i-cache and d-cache. I understand the aliasing case. but
> > for non-aliasing case why it just does "dc cvau" (in __flush_icache_range())
> > whithout really invalidate the icache?
> 
> The __flush_icache_range/__flush_cache_user_range assembly function does
> both the D-cache maintenance with DC CVAU, then the I-cache maintenance
> with IC IVAU, so I think you have misread it.
>a
Yes. I missed the IC IVAU instruction defined in macro
invalidate_icache_by_line.

> Thanks,
> Mark.
> 
> > Will i-cache refill from L2 cache?
> >
> > void sync_icache_aliases(void *kaddr, unsigned long len)
> > {
> > 	unsigned long addr = (unsigned long)kaddr;
> > 
> > 	if (icache_is_aliasing()) {
> > 		__clean_dcache_area_pou(kaddr, len);
> > 		__flush_icache_all();
> > 	} else {
> > 		/*
> > 		 * Don't issue kick_all_cpus_sync() after I-cache invalidation
> > 		 * for user mappings.
> > 		 */
> > 		__flush_icache_range(addr, addr + len);
> > 	}
> > }
> > 
> > -- 
> > Cheers,
> > Changbin Du

-- 
Cheers,
Changbin Du