lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 20 Aug 2017 01:29:02 +0530
From:   Shubham Bansal <illusionist.neo@...il.com>
To:     Alexei Starovoitov <ast@...com>
Cc:     Russell King - ARM Linux <linux@...linux.org.uk>,
        David Miller <davem@...emloft.net>,
        Network Development <netdev@...r.kernel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        linux-arm-kernel@...ts.infradead.org,
        LKML <linux-kernel@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>, Andrew Lunn <andrew@...n.ch>
Subject: Re: [PATCH net-next v3] arm: eBPF JIT compiler

> impressive work.
> Acked-by: Alexei Starovoitov <ast@...nel.org>

Thanks :)

I can't take all the credit. It was Daniel and Kees who helped me a lot.
I would have given up a long time ago without them.
>
> Any performance numbers with vs without JIT ?

Here is the mail from Kees on v1 of the patch.

For what it's worth, I did an comparison of the numbers Shubham posted
in another thread for the JIT, comparing the eBPF interpreter with his
new JIT. The post is here:

https://www.spinics.net/lists/netdev/msg436402.html

Other than that I can send the test runs which have time, but I will
not be able to compare them like kees this week.
Does that sound good?
>
>> +static const u8 bpf2a32[][2] = {
>> +       /* return value from in-kernel function, and exit value from eBPF
>> */
>> +       [BPF_REG_0] = {ARM_R1, ARM_R0},
>> +       /* arguments from eBPF program to in-kernel function */
>> +       [BPF_REG_1] = {ARM_R3, ARM_R2},
>
>
> as far as i understand arm32 calling convention the mapping makes sense
> to me. Hard to come up with anything better than the above.
I tried different versions of it, according to the need of different
eBPF instructions, as you can see, we are register deficient. This is
the best I could come up with.
Would love to hear any improvement over this.
>
>> +       /* function call */
>> +       case BPF_JMP | BPF_CALL:
>> +       {
>> +               const u8 *r0 = bpf2a32[BPF_REG_0];
>> +               const u8 *r1 = bpf2a32[BPF_REG_1];
>> +               const u8 *r2 = bpf2a32[BPF_REG_2];
>> +               const u8 *r3 = bpf2a32[BPF_REG_3];
>> +               const u8 *r4 = bpf2a32[BPF_REG_4];
>> +               const u8 *r5 = bpf2a32[BPF_REG_5];
>> +               const u32 func = (u32)__bpf_call_base + (u32)imm;
>> +
>> +               emit_a32_mov_r64(true, r0, r1, false, false, ctx);
>> +               emit_a32_mov_r64(true, r1, r2, false, true, ctx);
>> +               emit_push_r64(r5, 0, ctx);
>> +               emit_push_r64(r4, 8, ctx);
>> +               emit_push_r64(r3, 16, ctx);
>> +
>> +               emit_a32_mov_i(tmp[1], func, false, ctx);
>> +               emit_blx_r(tmp[1], ctx);
>
>
> to improve the cost of call we can teach verifier to mark the registers
> actually used to pass arguments, so not all pushes would be needed.
> But it may be drop in the bucket comparing to the cost of compound
> 64-bit alu ops.
Thats right. But still an improvement I guess. I think I discussed it
with Daniel and I thought, I should get this patch reach mainstream
first then I can improve on it.
> There was some work on llvm side to use 32-bit subregisters which
> should help 32-bit architectures and JITs, but it didn't go far.
> So if you're interested further improving bpf program speeds on arm32
> you may take a look at llvm side. I can certainly provide the tips.
Sure. Sounds good.

Best,
Shubham

Powered by blists - more mailing lists