netdev - Re: [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <912c38fc-e75d-a7f5-30bc-592e7498e958@iogearbox.net>
Date:   Wed, 16 Jan 2019 16:41:55 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Björn Töpel <bjorn.topel@...il.com>
Cc:     linux-riscv@...ts.infradead.org,
        Palmer Dabbelt <palmer@...ive.com>, davidlee@...ive.com,
        Netdev <netdev@...r.kernel.org>
Subject: Re: [RFC PATCH 3/3] bpf, riscv: added eBPF JIT for RV64G

On 01/16/2019 08:23 AM, Björn Töpel wrote:
> Den ons 16 jan. 2019 kl 00:50 skrev Daniel Borkmann <daniel@...earbox.net>:
>>
>> On 01/15/2019 09:35 AM, Björn Töpel wrote:
>>> This commit adds eBPF JIT for RV64G.
>>>
>>> Codewise, it needs some refactoring. Currently there's a bit too much
>>> copy-and-paste going on, and I know some places where I could optimize
>>> the code generation a bit (mostly BPF_K type of instructions, dealing
>>> with immediates).
>>
>> Nice work! :)
>>
>>> From a features perspective, two things are missing:
>>>
>>> * tail calls
>>> * "far-branches", i.e. conditional branches that reach beyond 13b.
>>>
>>> The test_bpf.ko passes all tests.
>>
>> Did you also check test_verifier under jit with/without jit hardening
>> enabled? That one contains lots of runtime tests as well. Probably makes
>> sense to check under CONFIG_BPF_JIT_ALWAYS_ON to see what fails the JIT;
>> the test_verifier also contains various tail call tests targeted at JITs,
>> for example.
>>
> 
> Good point! I will do that. The only selftests/bpf program that I ran
> (and passed) was "test_progs". I'll make sure that the complete bpf
> selftests suite passes as well!
> 
>> Nit: please definitely also add a MAINTAINERS entry with at least yourself
>> under BPF JIT section, and update Documentation/sysctl/net.txt with riscv64.
>>
> 
> Ah! Yes, I'll fix that.
> 
>>> Signed-off-by: Björn Töpel <bjorn.topel@...il.com>
>>> ---
>>>  arch/riscv/net/bpf_jit_comp.c | 1608 +++++++++++++++++++++++++++++++++
>>>  1 file changed, 1608 insertions(+)
>>>
>>> diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c
>>> index 7e359d3249ee..562d56eb8d23 100644
>>> --- a/arch/riscv/net/bpf_jit_comp.c
>>> +++ b/arch/riscv/net/bpf_jit_comp.c
>>> @@ -1,4 +1,1612 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * BPF JIT compiler for RV64G
>>> + *
>>> + * Copyright(c) 2019 Björn Töpel <bjorn.topel@...il.com>
>>> + *
>>> + */
>>> +
>>> +#include <linux/bpf.h>
>>> +#include <linux/filter.h>
>>> +#include <asm/cacheflush.h>
>>> +
>>> +#define TMP_REG_0 (MAX_BPF_JIT_REG + 0)
>>> +#define TMP_REG_1 (MAX_BPF_JIT_REG + 1)
>>
>> Not used?
>>
> 
> Correct! I'll get rid of them.
> 
>>> +#define TAIL_CALL_REG (MAX_BPF_JIT_REG + 2)
>>> +
>>> +enum rv_register {
>>> +     RV_REG_ZERO =   0,      /* The constant value 0 */
>>> +     RV_REG_RA =     1,      /* Return address */
>>> +     RV_REG_SP =     2,      /* Stack pointer */
>>> +     RV_REG_GP =     3,      /* Global pointer */
>>> +     RV_REG_TP =     4,      /* Thread pointer */
>>> +     RV_REG_T0 =     5,      /* Temporaries */
>>> +     RV_REG_T1 =     6,
>>> +     RV_REG_T2 =     7,
>>> +     RV_REG_FP =     8,
>>> +     RV_REG_S1 =     9,      /* Saved registers */
>>> +     RV_REG_A0 =     10,     /* Function argument/return values */
>>> +     RV_REG_A1 =     11,     /* Function arguments */
>>> +     RV_REG_A2 =     12,
>>> +     RV_REG_A3 =     13,
>>> +     RV_REG_A4 =     14,
>>> +     RV_REG_A5 =     15,
>>> +     RV_REG_A6 =     16,
>>> +     RV_REG_A7 =     17,
>>> +     RV_REG_S2 =     18,     /* Saved registers */
>>> +     RV_REG_S3 =     19,
>>> +     RV_REG_S4 =     20,
>>> +     RV_REG_S5 =     21,
>>> +     RV_REG_S6 =     22,
>>> +     RV_REG_S7 =     23,
>>> +     RV_REG_S8 =     24,
>>> +     RV_REG_S9 =     25,
>>> +     RV_REG_S10 =    26,
>>> +     RV_REG_S11 =    27,
>>> +     RV_REG_T3 =     28,     /* Temporaries */
>>> +     RV_REG_T4 =     29,
>>> +     RV_REG_T5 =     30,
>>> +     RV_REG_T6 =     31,
>>> +};
>>> +
>>> +struct rv_jit_context {
>>> +     struct bpf_prog *prog;
>>> +     u32 *insns; /* RV insns */
>>> +     int ninsns;
>>> +     int epilogue_offset;
>>> +     int *offset; /* BPF to RV */
>>> +     unsigned long seen_reg_bits;
>>> +     int stack_size;
>>> +};
>>> +
>>> +struct rv_jit_data {
>>> +     struct bpf_binary_header *header;
>>> +     u8 *image;
>>> +     struct rv_jit_context ctx;
>>> +};
>>> +
>>> +static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
>>> +{
>>
>> This one can also be simplified by having a simple mapping as in
>> other JITs and then mark __set_bit(<reg>) in the small bpf_to_rv_reg()
>> helper.
>>
> 
> Yeah, I agree. Much better. I'll take that route.
> 
>>> +     switch (bpf_reg) {
>>> +     /* Return value */
>>> +     case BPF_REG_0:
>>> +             __set_bit(RV_REG_A5, &ctx->seen_reg_bits);
>>> +             return RV_REG_A5;
>>> +     /* Function arguments */
>>> +     case BPF_REG_1:
>>> +             __set_bit(RV_REG_A0, &ctx->seen_reg_bits);
>>> +             return RV_REG_A0;
>>> +     case BPF_REG_2:
>>> +             __set_bit(RV_REG_A1, &ctx->seen_reg_bits);
>>> +             return RV_REG_A1;
>>> +     case BPF_REG_3:
>>> +             __set_bit(RV_REG_A2, &ctx->seen_reg_bits);
>>> +             return RV_REG_A2;
>>> +     case BPF_REG_4:
>>> +             __set_bit(RV_REG_A3, &ctx->seen_reg_bits);
>>> +             return RV_REG_A3;
>>> +     case BPF_REG_5:
>>> +             __set_bit(RV_REG_A4, &ctx->seen_reg_bits);
>>> +             return RV_REG_A4;
>>> +     /* Callee saved registers */
>>> +     case BPF_REG_6:
>>> +             __set_bit(RV_REG_S1, &ctx->seen_reg_bits);
>>> +             return RV_REG_S1;
>>> +     case BPF_REG_7:
>>> +             __set_bit(RV_REG_S2, &ctx->seen_reg_bits);
>>> +             return RV_REG_S2;
>>> +     case BPF_REG_8:
>>> +             __set_bit(RV_REG_S3, &ctx->seen_reg_bits);
>>> +             return RV_REG_S3;
>>> +     case BPF_REG_9:
>>> +             __set_bit(RV_REG_S4, &ctx->seen_reg_bits);
>>> +             return RV_REG_S4;
>>> +     /* Stack read-only frame pointer to access stack */
>>> +     case BPF_REG_FP:
>>> +             __set_bit(RV_REG_S5, &ctx->seen_reg_bits);
>>> +             return RV_REG_S5;
>>> +     /* Temporary register */
>>> +     case BPF_REG_AX:
>>> +             __set_bit(RV_REG_T0, &ctx->seen_reg_bits);
>>> +             return RV_REG_T0;
>>> +     /* Tail call counter */
>>> +     case TAIL_CALL_REG:
>>> +             __set_bit(RV_REG_S6, &ctx->seen_reg_bits);
>>> +             return RV_REG_S6;
>>> +     default:
>>> +             return 0;
>>> +     }
>>> +};
>> [...]
>>> +     /* tail call */
>>> +     case BPF_JMP | BPF_TAIL_CALL:
>>> +             rd = bpf_to_rv_reg(TAIL_CALL_REG, ctx);
>>> +             pr_err("bpf-jit: tail call not supported yet!\n");
>>> +             return -1;
>>
>> There are two options here, either fixed size prologue where you can
>> then jump over it in tail call case, or dynamic one which would make
>> it slower due to reg restore but shrinks image for non-tail calls.
> 
> So, it would be the latter then, which is pretty much like a more
> expensive (due to the tail call depth checks) function call.

Right.

> For the fixed prologue: how does, say x86, deal with BPF stack usage
> in the tail call case? If the caller doesn't use the bpf stack, but
> the callee does. From a quick glance in the code, the x86 prologue
> still uses aux->stack_depth. If the callee has a different stack usage
> that the caller, and then the callee does a function call, wouldn't
> this mess up the frame? (Yeah, obviously missing something! :-))

Basically in this case verifier sets stack size to MAX_BPF_STACK when it
finds a tail call in the prog, meaning the callee will be reusing <= stack
size than the caller and then upon exit unwinds it via leave+ret.

Cheers,
Daniel