lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7096e4d4-d27a-42f5-34d0-11ce7156a8f1@arm.com>
Date:   Tue, 16 Nov 2021 23:12:19 +0000
From:   Robin Murphy <robin.murphy@....com>
To:     Guanghui Feng <guanghuifeng@...ux.alibaba.com>,
        catalin.marinas@....com, will@...nel.org, maz@...nel.org,
        qperret@...gle.com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Cc:     baolin.wang@...ux.alibaba.com, zhuo.song@...ux.alibaba.com,
        zhangliguang@...ux.alibaba.com
Subject: Re: [PATCH] arm64: clear_page: use stnp non-temporal instruction for
 performance optimizing

On 2021-11-16 15:08, Guanghui Feng wrote:
> When clear page mem, there is no need to alloc cache for storing these
> mem value. And the copy_page.S have used stnp instruction for optimizing.
> So I rewrite the clear_page.S with stnp. At the same time, I have tested it
> with stnp instruction which will get about twice the performance improvement.
> 
> Signed-off-by: Guanghui Feng <guanghuifeng@...ux.alibaba.com>
> ---
>   arch/arm64/lib/clear_page.S | 19 ++++++++++++-------
>   1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
> index b84b179..e9dc2d6 100644
> --- a/arch/arm64/lib/clear_page.S
> +++ b/arch/arm64/lib/clear_page.S
> @@ -15,13 +15,18 @@
>    *	x0 - dest
>    */
>   SYM_FUNC_START_PI(clear_page)
> -	mrs	x1, dczid_el0
> -	and	w1, w1, #0xf
> -	mov	x2, #4
> -	lsl	x1, x2, x1
> -
> -1:	dc	zva, x0
> -	add	x0, x0, x1
> +	mov	x1, #0
> +	mov	x2, #0

Regardless of the bigger question around the architectural intent that 
DC ZVA is supposed to be the best way to clear memory (sanity check: 
this wasn't under virtualisation with HCR_EL2.TDZ set, was it?) - out of 
curiosity, why do this and not just "stnp xzr, xzr, ..."?

Note also that this is liable to conflict with the patch for respecting 
DCZID_EL0.DZP. On which note, is DC {GVA,GZVA} performance also a 
concern, or does your platform not have MTE? If the performance anomaly 
does turn out to be platform-specific, maybe it might be better to quirk 
those platforms to set DZP, rather than changing the code for everyone?

Robin.

> +1:
> +	stnp	x1, x2, [x0]
> +	stnp	x1, x2, [x0, #16]
> +	stnp	x1, x2, [x0, #32]
> +	stnp	x1, x2, [x0, #48]
> +	stnp	x1, x2, [x0, #64]
> +	stnp	x1, x2, [x0, #80]
> +	stnp	x1, x2, [x0, #96]
> +	stnp	x1, x2, [x0, #112]
> +	add	x0, x0, #128
>   	tst	x0, #(PAGE_SIZE - 1)
>   	b.ne	1b
>   	ret
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ