linux-kernel - Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 23 Mar 2018 18:51:19 +0530
From:   Vaneet Narang <v.narang@...sung.com>
To:     Maninder Singh <maninder1.s@...sung.com>,
        Yann Collet <cyan@...com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
CC:     "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Nick Terrell <terrelln@...com>,
        PANKAJ MISHRA <pankaj.m@...sung.com>,
        AMIT SAHRAWAT <a.sahrawat@...sung.com>
Subject: Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.


Hi Nick,

Thanks for your comments, Please check my reply to few of your comments.
I will be sharing benchmarking figures separately. 

> 
>> +                        if (curr_offset > 127) {
>> +                                curr_offset = (curr_offset << 1) | DYN_BIT;
>> +                                LZ4_writeLE16(op, (U16)curr_offset);
>> +                                op += 2;
>> +                        } else {
>> +                                curr_offset = curr_offset << 1;
>> +                                *op = (BYTE)curr_offset;
>> +                                op++;
>> +                        }
> 
>The standard way to do variable sized integers is to use the high-bit as
>the control bit, not the low-bit. Do you have benchmarks to show that using
>the low-bit is faster than using the high-bit? If so, lets comment in the
>code, if not lets use the high-bit.
> 
We are not sure about performance difference of using low or high bit but Since as per 
LZ4 specification, offset needs to be stored in little endian format so keeping Low bit 
as control bit makes it easier to retreive offset when offset is spread across two bytes.

offset = LZ4_readLE16(ip);
if (offset & 0x1) {
	offset = offset >> 1; // Just one Right shift
	ip += 2;
} else {
	offset = (offset & 0xff) >> 1; // Only Two Operation for one byte offset
	ip += 1;
}

So not sure if keeping High bit will make much difference as we will be needing 
same or more operations to get offset in case of High bit.

>>                  /* copy literals */
>>                  cpy = op + length;
>>                  if (((endOnInput) && ((cpy > (partialDecoding ? oexit : oend - MFLIMIT))
>> -                        || (ip + length > iend - (2 + 1 + LASTLITERALS))))
>> -                        || ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH))) {
>> +                        || (ip + length > iend - (2 + LASTLITERALS))))
>> +                        || ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH - 1))) {
>>                          if (partialDecoding) {
>>                                  if (cpy > oend) {
>>                                          /*
>> @@ -188,13 +190,31 @@ static FORCE_INLINE int LZ4_decompress_generic(
>>                          break;
>>                  }
>>
>> -                LZ4_wildCopy(op, ip, cpy);
>> +                if (dynOffset && length < 4)
>> +                        LZ4_copy4(op, ip);
>> +                else
>> +                        LZ4_wildCopy(op, ip, cpy);
>> +
> 
>The LZ4 format enforces that the last 5 bytes are literals so that
>LZ4_wildCopy() can be used here. I suspect that having this extra branch
>here for `dynOffset` mode hurts decompression speed.
> 
This check is made to handle one byte read overflow when decompressing last frame. wildCopy does 8 byte copy blindly.

Issue Case:
0 length literal followed by 1 byte offset and then it ends with one byte token and 5 Byte Last Literals.

With this combination only 7 bytes (1 Byte Offset + 1 Byte Token + 5 Byte Literal) remains at the end of 
input buffer so reading 8 bytes from input buffer results in 1 byte overflow.

Since 1 byte offset is not there in original LZ4, so this issue is not possible. To avoid overhead of this check, 
we planned to use 6 Byte as Last literal Length.

-#define LASTLITERALS    5
+#define LASTLITERALS    6

>>
>>  int LZ4_decompress_safe(const char *source, char *dest,
>> -        int compressedSize, int maxDecompressedSize)
>> +        int compressedSize, int maxDecompressedSize, bool dynOffset)
>>  {
>>          return LZ4_decompress_generic(source, dest, compressedSize,
>>                  maxDecompressedSize, endOnInputSize, full, 0,
>> -                noDict, (BYTE *)dest, NULL, 0);
>> +                noDict, (BYTE *)dest, NULL, 0, dynOffset);
> 
>You'll need to use the same trick that LZ4_compress_fast() uses, by hard
>coding `dynOffset`. We want the compiler to generate too version of
>LZ4_decompress_generic(), one with `dynOffset` and one with `!dynOffset`.
>That way the tight loop won't the branches that check `dynOffset`.
> 
>        if (dynOffset)
>                return LZ4_decompress_generic(..., true);
>        else
>                return LZ4_decompress_generic(..., false);
> 
>Without this trick, I expect that this patch causes a regression to both
>LZ4 and LZ4_DYN decompression speed.

Since there is no backward compatibility of our solution with original LZ4 so we
are bit confused if we should make it as separate API to avoid overhead of dynOffset
checks every where in the code and also to avoid changing prototype of exported functions 
LZ4_decompress/LZ4_compress OR we should keep these checks to avoid redundant code. 
Kindly suggest

Thanks & Regards,
Vaneet Narang