netdev - Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 18 May 2017 18:22:40 -0700
From:   Alexei Starovoitov <ast@...com>
To:     Edward Cree <ecree@...arflare.com>,
        David Miller <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>
CC:     <alexei.starovoitov@...il.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment
 in verifier.

On 5/18/17 9:38 AM, Edward Cree wrote:
> On 18/05/17 15:49, Edward Cree wrote:
>> Here's one idea that seemed to work when I did a couple of experiments:
>> let A = (a;am), B = (b;bm) where the m are the masks
>> Σ = am + bm + a + b
>> χ = Σ ^ (a + b) /* unknown carries */
>> μ = χ | am | bm /* mask of result */
>> then A + B = ((a + b) & ~μ; μ)
>>
>> The idea is that we find which bits change between the case "all x are
>>  1" and "all x are 0", and those become xs too.

 > https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d

I played with it quite a bit trying to break it and have to
agree that the above algorithm works.
At least for add and sub I think it's solid.
Still feels a bit magical, since it gave me better results
than I could envision for my test vectors.

In your .py I'd only change __str__(self) to print them in mask,value
as the order they're passed into constructor to make it easier to read.
The bin(self) output is the most useful, of course.
We should carry it into the kernel too for debugging.

> And now I've found a similar algorithm for subtraction, which (again) I
>  can't prove but it seems to work.
> α = a + am - b
> β = a - b - bm
> χ = α ^ β
> μ = χ | α | β
> then A - B = ((a - b) & ~μ; μ)
> Again we're effectively finding the max. and min. values, and XORing
>  them to find unknown carries.
>
> Bitwise operations are easy, of course;
> /* By assumption, a & am == b & bm == 0 */
> A & B = (a & b; (a | am) & (b | bm) & ~(a & b))
> A | B = (a | b; (am | bm) & ~(a | b))
> /* It bothers me that & and | aren't symmetric, but I can't fix it */
> A ^ B = (a ^ b; am | bm)
>
> as are shifts by a constant (just shift 0s into both number and mask).
>
> Multiplication by a constant can be done by decomposing into shifts
>  and adds; but it can also be done directly; here we find (a;am) * k.
> π = a * k
> γ = am * k
> then A * k = (π; 0) + (0; γ), for which we use our addition algo.
>
> Multiplication of two unknown values is a nightmare, as unknown bits
>  can propagate all over the place.  We can do a shift-add
>  decomposition where the adds for unknown bits have all the 1s in
>  the addend replaced with xs.  A few experiments suggest that this
>  works, regardless of the order of operands.  For instance
>  110x * x01 comes out as either
>     110x
> + xx0x
> = xxxx0x
> or
>      x0x
>    x01
> + x01
> = xxxx0x
> We can slightly optimise this by handling all the 1 bits in one go;
>  that is, for (a;am) * (b;bm) we first find (a;am) * b using our
>  multiplication-by-a-constant algo above, then for each bit in bm
>  we find (a;am) * bit and force all its nonzero bits to unknown;
>  finally we add all our components.

this mul algo I don't completely understand. It feels correct,
but I'm not sure we really need it for the kernel.
For all practical cases llvm will likely emit shifts or sequence
of adds and shifts, so multiplies by crazy non-optimizable constant
or variable are rare and likely the end result is going to be
outside of packet boundary, so it will be rejected anyway and
precise alignment tracking doesn't matter much.
What I love about the whole thing that it works for access into
packet, access into map values and in the future for any other
variable length access.

> Don't even ask about division; that scrambles bits so hard that the

yeah screw div and mod. We have an option to disable div/mod altogether
under some new 'prog_flags', since it has this ugly 'div by 0'
exception path. We don't even have 'signed division' instruction and
llvm errors like:
     errs() << "Unsupport signed division for DAG: ";
     errs() << "Please convert to unsigned div/mod.\n";
and no one complained. It just means that division is extremely rare.

Are you planning to work on the kernel patch for this algo?
Once we have it the verifier will be smarter regarding
alignment tracking than any compiler i know :)