linux-kernel - Re: Two questions about cache coherency on arm platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200323131720.GE2597@C02TD0UTHF1T.local>
Date:   Mon, 23 Mar 2020 13:17:20 +0000
From:   Mark Rutland <mark.rutland@....com>
To:     Changbin Du <changbin.du@...il.com>
Cc:     linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: Two questions about cache coherency on arm platforms

On Mon, Mar 23, 2020 at 08:35:26PM +0800, Changbin Du wrote:
> Hi, All,
> I am not very familiar with ARM processors. I have two questions about
> cache coherency. Could anyone help me?
> 
> 1. How is cache coherency maintenanced on ARMv8 big.LITTLE system?
> As far as I know, big cores and little cores are in seperate clusters on
> big.LITTLE system.

This is often true, but not always the case. For example, with DSU big
and little cores can be placed within the same cluster.

> And cache coherence betwwen clusters requires the
> memory regions are marked as 'Outer Shareable' and is very expensive.

This is not correct.

Linux requires that all cores it uses are within the same Inner
Shareable domain, regardless of whether they are in distinct clusters.
Linux does not support systems where cores are in distinct Inner
Shareable domains.

This is the intended use of the architecture. Per ARM DDI 0487E.a page
B2-144:

| This architecture assumes that all PEs that use the same operating
| system or hypervisor are in the same Inner Shareable shareability
| shareability

... where a PE is a "Processing Element", which you can think of as a
single core.

> I have checked the kernel code, and seems it only requires coherence in
> 'Inner Shareable' domain. So my question is how can linux guarantees
> cache coherence in 'CPU migration' or 'Global Task Scheduling' models
> wich both clusters are active at the same time? For example, a thread
> ran in Cluster A and modified 'Inner Shareable' memory, then it migrates
> to Cluster B.

As above, this works because all the relevant cores are within the same
Inner Shareable domain.

> 2. ARM64 cache maintenance code sync_icache_aliases() for non-aliasing icache.
> In linux kernel on arm64 platform, the flow function sync_icache_aliases()
> is used to sync i-cache and d-cache. I understand the aliasing case. but
> for non-aliasing case why it just does "dc cvau" (in __flush_icache_range())
> whithout really invalidate the icache?

The __flush_icache_range/__flush_cache_user_range assembly function does
both the D-cache maintenance with DC CVAU, then the I-cache maintenance
with IC IVAU, so I think you have misread it.

Thanks,
Mark.

> Will i-cache refill from L2 cache?
>
> void sync_icache_aliases(void *kaddr, unsigned long len)
> {
> 	unsigned long addr = (unsigned long)kaddr;
> 
> 	if (icache_is_aliasing()) {
> 		__clean_dcache_area_pou(kaddr, len);
> 		__flush_icache_all();
> 	} else {
> 		/*
> 		 * Don't issue kick_all_cpus_sync() after I-cache invalidation
> 		 * for user mappings.
> 		 */
> 		__flush_icache_range(addr, addr + len);
> 	}
> }
> 
> -- 
> Cheers,
> Changbin Du