linux-kernel - Re: [PATCH] lib/lz4: smatch warning in LZ4_decompress

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D4762145-BBC5-4574-BF68-8C1A3AF41D98@fb.com>
Date:   Mon, 8 Jun 2020 00:40:26 +0000
From:   Yann Collet <cyan@...com>
To:     Vasily Averin <vvs@...tuozzo.com>, Gao Xiang <hsiangkao@....com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Andrew Morton" <akpm@...ux-foundation.org>,
        Gao Xiang <xiang@...nel.org>
Subject: Re: [PATCH] lib/lz4: smatch warning in LZ4_decompress_generic()

Hi Vasily


If I do understand the discussion, the question is about usage of `&` instead of `&&`,
and the speculation that it might be an error.

It's not an error. Unfortunately, explaining the reasoning behind this decision is a bit long.

First, in term of outcome, I believe it should be clear that : 
(endOnInput ? ip < shortiend : 1) && (op <= shortoend)
is completely equivalent to : 
(endOnInput ? ip < shortiend : 1) & (op <= shortoend)

Note that one cannot just mindlessly replace && by & anywhere and expect it to work the same,
In the general case of int A,B;  A && B is different from A & B.
But (A>0) && (B>0) is equivalent to (A>0) & (B>0), because each side of the test is now a strict 0/1 value.
This condition is respected above.

Now why selecting one rather than the other ?
It has something to do with C language sequential guarantees and the pipeline of modern cpus.

When spelling: if (f(A) && f(B))
the program must do f(A) first, and then, if it's true, move on to f(B).
This introduces a serial dependency and an additional intermediate branch.

Whereas in: if (f(A) & f(B))
f(A) and f(B) can be executed in parallel,
then combined with a simple AND operation,
and there is only one branch, at the end.

Once again, this should not be done mindlessly :
such construction is only valid if f(B) is truly completely independent from f(A).
For example, in : (op != NULL) && (*op=K), one must use &&, otherwise it's UB, because the second test is only valid after the first one is verified.

But for the example at stake : 
(endOnInput ? ip < shortiend : 1) & (op <= shortoend)
one can see that each test is fully independent,
(op<=shortoend) doesn't depend on anything from the left side.


OK, now, does it make a difference ?
Well, as usual, it depends.
If it was a branch anywhere in the code base, it probably doesn't.
One needs a very hot code section for such a difference to become noticeable.
But it happens that in lz4, we do have such hot sections.

There is another issue at stake, almost invisible :
if you know the architecture of modern intel cpu,
you might know that they have a micro-op cache.
The way this micro-op cache is structured is a bit complex, and more importantly, poorly documented, and changes with each chip generation.
But well, among the notable elements, the concentration of branches matter :
if there are too many branches densely packed, the micro-op cache cannot do its job, 
therefore, every time the instruction segment must be played, it has to be decoded again, costing significant performance.

Also, the branch predictor has a limited history, and a denser pack of branches ends up being more difficult to track hence to predict
(here also, details are not especially well documented, and change all the time. The only "generic rule" is that less branches tends to be easier than more branches).

So, as one can see, it's _generally_ better to replace a && branch by an & AND operation, when conditions are fulfilled,
but it only matters within very hot code segments.

To make matters even more difficult, sometimes, clever compilers can make the same choice automatically,
realizing that in a code like f(A) && f(B), there may actually be a way to execute both sides and replace the intermediate branch by a mask.
The problem is, that's true sometimes. Not always.
Other compilers will react differently and may miss this opportunity.
That's important for codes designed to be highly portable, as "it works fine on my machine" isn't enough to justify a modification that may affect other systems.


Anyway, I hope it answers your question : 
- both expressions are equivalent, though that's only because some conditions are carefully respected.
- && is better from a style perspective, and is my first choice for "normal" code
- hot code segments value speed, in which case, & can be a more efficient alternative


Regards

Yann

On 6/6/20, 08:16, "Vasily Averin" <vvs@...tuozzo.com> wrote:

    Dear Yann,
    could you please consult us about your lz4 pacth
    https://github.com/lz4/lz4/commit/1a191b3f8d26b50a7c1d41590b529ec308d768cd

    Please see details below.

    Thank you,
    	Vasily Averin

    On 6/6/20 5:36 PM, Gao Xiang wrote:
    > On Sat, Jun 06, 2020 at 04:28:02PM +0300, Vasily Averin wrote:
    >> Found by smatch:
    >> lib/lz4/lz4_decompress.c:150 LZ4_decompress_generic() warn: maybe use && instead of &
    >> It was realy incorrectly copied from
    >> https://github.com/lz4/lz4/commit/45f8603aae389d34c689d3ff7427b314071ccd2c
    >> line 1431
    > 
    > Simply no.
    > 
    >>
    >> Fixes: 2209fda323e2 ("lib/lz4: update LZ4 decompressor module")
    >> Signed-off-by: Vasily Averin <vvs@...tuozzo.com>
    >> ---
    >>  lib/lz4/lz4_decompress.c | 2 +-
    >>  1 file changed, 1 insertion(+), 1 deletion(-)
    >commit 1a191b3f8d26b50a7c1d41590b529ec308d768cd
    Author: Yann Collet <cyan@...com>
    Date:   Wed May 2 10:33:12 2018 -0700

        simplify shortcut

    diff --git a/lib/lz4.c b/lib/lz4.c
    index c6f0426..b46910f 100644
    --- a/lib/lz4.c
    +++ b/lib/lz4.c
    @@ -1429,62 +1429,29 @@ LZ4_FORCE_INLINE int LZ4_decompress_generic(
              */
             if ((endOnInput ? length != RUN_MASK : length <= 8) &&
                 /* strictly "less than" on input, to re-enter the loop with at least one byte */
    -            likely((endOnInput ? ip < shortiend : 1) && (op <= shortoend)))
    +            likely((endOnInput ? ip < shortiend : 1) & (op <= shortoend)))
    >
    >> diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c
    >> index 0c9d3ad..f7f7dca 100644
    >> --- a/lib/lz4/lz4_decompress.c
    >> +++ b/lib/lz4/lz4_decompress.c
    >> @@ -147,7 +147,7 @@ static FORCE_INLINE int LZ4_decompress_generic(
    >>  		    * strictly "less than" on input, to re-enter
    >>  		    * the loop with at least one byte
    >>  		    */
    >> -		   && likely((endOnInput ? ip < shortiend : 1) &
    >> +		   && likely((endOnInput ? ip < shortiend : 1) &&
    > 
    > I'd like to say, this is not my mistake (even not an issue).
    > If you notice the latest LZ4 upstream
    > https://github.com/lz4/lz4/blob/dev/lib/lz4.c#L1865
    > 
    > Or some related change, the lz4 author Cyan did it on purpose.
    > https://github.com/lz4/lz4/commit/1a191b3f8d26b50a7c1d41590b529ec308d768cd

    For me it looks like typo in patch,
    lets ask author of this commit.

    > I think we could follow the latest LZ4 upstream in order to
    > avoid further maintainence overhead. That's my own thought
    > anyway.
    > 
    > Thanks,
    > Gao Xiang
    >