lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7e806ed3c074534a24b74f827bcc914@AcuMS.aculab.com>
Date:   Tue, 22 Feb 2022 17:02:16 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Keith Busch' <kbusch@...nel.org>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "axboe@...nel.dk" <axboe@...nel.dk>, "hch@....de" <hch@....de>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>,
        "colyli@...e.de" <colyli@...e.de>
Subject: RE: [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64

From: Keith Busch
> Sent: 22 February 2022 16:32
> 
> The crc64 table lookup method is inefficient, using a significant number
> of CPU cycles in the block stack per IO. If available on x86, use a
> PCLMULQDQ implementation to accelerate the calculation.
> 
> The assembly from this patch was mostly generated by gcc from a C
> program using library functions provided by x86 intrinsics, and measures
> ~20x faster than the table lookup.

I think I'd like to see the C code and compiler options used to
generate the assembler as comments in the committed source file.
Either that or reasonable comments in the assembler.

It is also quite a lot of code.
What is the break-even length for 'cold cache' including the FPU saves.

...
> +.section	.rodata
> +.align 32
> +.type	shuffleMasks, @object
> +.size	shuffleMasks, 32
> +shuffleMasks:
> +	.string	""
> +	.ascii	"\001\002\003\004\005\006\007\b\t\n\013\f\r\016\017\217\216\215"
> +	.ascii	"\214\213\212\211\210\207\206\205\204\203\202\201\200"

That has to be the worst way to define 32 bytes.

> +.section	.rodata.cst16,"aM",@progbits,16
> +.align 16
> +.LC0:
> +	.quad	-1523270018343381984
> +	.quad	2443614144669557164
> +	.align 16
> +.LC1:
> +	.quad	2876949357237608311
> +	.quad	3808117099328934763

Not sure what those are, but I bet there are better ways to
define/describe them.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ