lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aV6IWBtqp1dnOZuX@willie-the-truck>
Date: Wed, 7 Jan 2026 16:22:48 +0000
From: Will Deacon <will@...nel.org>
To: Lucas Wei <lucaswei@...gle.com>
Cc: Catalin Marinas <catalin.marinas@....com>,
	Jonathan Corbet <corbet@....net>, sjadavani@...gle.com,
	kernel test robot <lkp@...el.com>, stable@...r.kernel.org,
	kernel-team@...roid.com, linux-arm-kernel@...ts.infradead.org,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	robin.murphy@....com
Subject: Re: [PATCH v2] arm64: errata: Workaround for SI L1 downstream
 coherency issue

[+Robin as he's been involved with this]

On Mon, Dec 29, 2025 at 03:36:19AM +0000, Lucas Wei wrote:
> When software issues a Cache Maintenance Operation (CMO) targeting a
> dirty cache line, the CPU and DSU cluster may optimize the operation by
> combining the CopyBack Write and CMO into a single combined CopyBack
> Write plus CMO transaction presented to the interconnect (MCN).
> For these combined transactions, the MCN splits the operation into two
> separate transactions, one Write and one CMO, and then propagates the
> write and optionally the CMO to the downstream memory system or external
> Point of Serialization (PoS).
> However, the MCN may return an early CompCMO response to the DSU cluster
> before the corresponding Write and CMO transactions have completed at
> the external PoS or downstream memory. As a result, stale data may be
> observed by external observers that are directly connected to the
> external PoS or downstream memory.
> 
> This erratum affects any system topology in which the following
> conditions apply:
>  - The Point of Serialization (PoS) is located downstream of the
>    interconnect.
>  - A downstream observer accesses memory directly, bypassing the
>    interconnect.
> 
> Conditions:
> This erratum occurs only when all of the following conditions are met:
>  1. Software executes a data cache maintenance operation, specifically,
>     a clean or invalidate by virtual address (DC CVAC, DC CIVAC, or DC
>     IVAC), that hits on unique dirty data in the CPU or DSU cache. This
>     results in a combined CopyBack and CMO being issued to the
>     interconnect.

Why do we need to worry about IVAC here? Even though that might be
upgraded to CIVAC and result in the erratum conditions, the DMA API
shouldn't use IVAC on dirty lines so I don't think we need to worry
about it.

> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index f0ca7196f6fa..d3d46e5f7188 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -381,6 +381,9 @@ alternative_endif
>  	.macro dcache_by_myline_op op, domain, start, end, linesz, tmp, fixup
>  	sub	\tmp, \linesz, #1
>  	bic	\start, \start, \tmp
> +alternative_if ARM64_WORKAROUND_4311569
> +	mov	\tmp, \start
> +alternative_else_nop_endif
>  .Ldcache_op\@:
>  	.ifc	\op, cvau
>  	__dcache_op_workaround_clean_cache \op, \start
> @@ -402,6 +405,13 @@ alternative_endif
>  	add	\start, \start, \linesz
>  	cmp	\start, \end
>  	b.lo	.Ldcache_op\@
> +alternative_if ARM64_WORKAROUND_4311569
> +	.ifnc	\op, cvau
> +	mov	\start, \tmp
> +	mov	\tmp, xzr
> +	cbnz	\start, .Ldcache_op\@
> +	.endif
> +alternative_else_nop_endif

So you could also avoid this for ivac, although it looks like this is
only called for civac, cvau, cvac and cvap so perhaps not worth it.

> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index 503567c864fd..ddf0097624ed 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -143,9 +143,14 @@ SYM_FUNC_END(dcache_clean_pou)
>   *	- end     - kernel end address of region
>   */
>  SYM_FUNC_START(__pi_dcache_inval_poc)
> +alternative_if ARM64_WORKAROUND_4311569
> +	mov	x4, x0
> +	mov	x5, x1
> +	mov	x6, #1
> +alternative_else_nop_endif
>  	dcache_line_size x2, x3
>  	sub	x3, x2, #1
> -	tst	x1, x3				// end cache line aligned?
> +again:	tst	x1, x3				// end cache line aligned?
>  	bic	x1, x1, x3
>  	b.eq	1f
>  	dc	civac, x1			// clean & invalidate D / U line
> @@ -158,6 +163,12 @@ SYM_FUNC_START(__pi_dcache_inval_poc)
>  3:	add	x0, x0, x2
>  	cmp	x0, x1
>  	b.lo	2b
> +alternative_if ARM64_WORKAROUND_4311569
> +	mov	x0, x4
> +	mov	x1, x5
> +	sub	x6, x6, #1
> +	cbz	x6, again
> +alternative_else_nop_endif
>  	dsb	sy
>  	ret
>  SYM_FUNC_END(__pi_dcache_inval_poc)

But this whole part could be dropped? The CIVACs are just for the
unaligned parts at the ends of the buffer and we shouldn't need to worry
about propagating them -- we just don't want to chuck them away with an
invalidation!

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ