lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 11 Jul 2017 06:01:38 +0000
From:   Nick Terrell <terrelln@...com>
To:     "Austin S. Hemmelgarn" <ahferroin7@...il.com>,
        Adam Borowski <kilobyte@...band.pl>
CC:     Kernel Team <Kernel-team@...com>, Chris Mason <clm@...com>,
        Yann Collet <cyan@...com>, David Sterba <dsterba@...e.cz>,
        "linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 3/4] btrfs: Add zstd support

On 7/10/17, 9:57 PM, "Nick Terrell" <terrelln@...com> wrote:
> The problem is caused by a gcc-7 bug [1]. It miscompiles
> ZSTD_wildcopy(void *dst, void const *src, ptrdiff_t len) when len is 0.
> It only happens when it can't analyze ZSTD_copy8(), which is the case in
> the kernel, because memcpy() is implemented with inline assembly. The
> generated code is slow anyways, so I propose this workaround, which will
> be included in the next patch set. I've confirmed that it fixes the bug for
> me. This alternative implementation is also 10-20x faster, and compiles to
> the same x86 assembly as the original ZSTD_wildcopy() with the userland
> memcpy() implementation [2].
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388#add_comment
> [2] https://godbolt.org/g/q5YpLx
>
> Signed-off-by: Nick Terrell <terrelln@...com>
> ---
>  lib/zstd/zstd_internal.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h
> index 6748719..ade0365 100644
> --- a/lib/zstd/zstd_internal.h
> +++ b/lib/zstd/zstd_internal.h
> @@ -126,7 +126,9 @@ static const U32 OF_defaultNormLog = OF_DEFAULTNORMLOG;
>  /*-*******************************************
>  *  Shared functions to include for inlining
>  *********************************************/
> -static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); }
> +static void ZSTD_copy8(void *dst, const void *src) {
> +       ZSTD_write64(dst, ZSTD_read64(src));
> +}

Sorry, my patch still triggered the gcc bug, I used the wrong compiler.
This patch works, and runs about the same speed as before the patch for
small inputs, and slightly faster for larger inputs (100+ bytes). I'll
look for a faster workaround if benchmarks show it matters.

Signed-off-by: Nick Terrell <terrelln@...com>
---
 lib/zstd/zstd_internal.h | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h
index 6748719..839014d 100644
--- a/lib/zstd/zstd_internal.h
+++ b/lib/zstd/zstd_internal.h
@@ -139,12 +139,8 @@ static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); }
 #define WILDCOPY_OVERLENGTH 8
 ZSTD_STATIC void ZSTD_wildcopy(void *dst, const void *src, ptrdiff_t length)
 {
-	const BYTE *ip = (const BYTE *)src;
-	BYTE *op = (BYTE *)dst;
-	BYTE *const oend = op + length;
-	do
-		COPY8(op, ip)
-	while (op < oend);
+	if (length > 0)
+		memcpy(dst, src, length);
 }
 
 ZSTD_STATIC void ZSTD_wildcopy_e(void *dst, const void *src, void *dstEnd) /* should be faster for decoding, but strangely, not verified on all platform */
-- 
2.9.3



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ