lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YZP1qMj4dWCbDrN6@arm.com>
Date:   Tue, 16 Nov 2021 18:17:12 +0000
From:   Catalin Marinas <catalin.marinas@....com>
To:     Guanghui Feng <guanghuifeng@...ux.alibaba.com>
Cc:     will@...nel.org, maz@...nel.org, qperret@...gle.com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        baolin.wang@...ux.alibaba.com, zhuo.song@...ux.alibaba.com,
        zhangliguang@...ux.alibaba.com
Subject: Re: [PATCH] arm64: clear_page: use stnp non-temporal instruction for
 performance optimizing

On Tue, Nov 16, 2021 at 11:08:14PM +0800, Guanghui Feng wrote:
> When clear page mem, there is no need to alloc cache for storing these
> mem value.

I theory, DC ZVA is supposed to trigger write streaming mode and all
writes go directly to memory avoiding cache allocation.

> And the copy_page.S have used stnp instruction for optimizing.
> So I rewrite the clear_page.S with stnp. At the same time, I have tested it
> with stnp instruction which will get about twice the performance improvement.

On which CPU implementation? Is the same improvement seen on a wider
range of CPUs?

-- 
Catalin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ