linux-kernel - Re: [PATCH] ARM64: Improve copy_page for 128 cache line sizes.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151221124637.GN23092@arm.com>
Date:	Mon, 21 Dec 2015 12:46:38 +0000
From:	Will Deacon <will.deacon@....com>
To:	Andrew Pinski <apinski@...ium.com>
Cc:	pinsia@...il.com, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ARM64: Improve copy_page for 128 cache line sizes.

On Sat, Dec 19, 2015 at 04:11:18PM -0800, Andrew Pinski wrote:
> Adding a check for the cache line size is not much overhead.
> Special case 128 byte cache line size.
> This improves copy_page by 85% on ThunderX compared to the
> original implementation.

So this patch seems to:

  - Align the loop
  - Increase the prefetch size
  - Unroll the loop once

Do you know where your 85% boost comes from between these? I'd really
like to avoid having multiple versions of copy_page, if possible, but
maybe we could end up with something that works well enough regardless
of cacheline size. Understanding what your bottleneck is would help to
lead us in the right direction.

Also, how are you measuring the improvement? If you can share your
test somewhere, I can see how it affects the other systems I have access
to.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/