netdev - Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMEtUuzT6smejsgoKP5c_NCw8mQw39xfPR+hsfV1Oh77ZeG3bA@mail.gmail.com>
Date:	Wed, 23 Apr 2014 16:14:50 -0700
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	Daniel Borkmann <dborkman@...hat.com>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for
 ARG1/CTX handling

On Wed, Apr 23, 2014 at 3:57 PM, Daniel Borkmann <dborkman@...hat.com> wrote:
> On 04/24/2014 12:37 AM, Alexei Starovoitov wrote:
>>
>> On Wed, Apr 23, 2014 at 3:21 PM, Daniel Borkmann <dborkman@...hat.com>
>> wrote:
>>>
>>> On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
>>>>
>>>> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@...hat.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Currently, at initial setup in __sk_run_filter() we initialize the
>>>>> BPF stack's frame-pointer and CTX register. However, instead of the
>>>>> CTX register, we initialize context to ARG1, and during user filter
>>>>> migration we emit *always* an instruction that copies the content
>>>>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>>>>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>>>>> to ARG1 in these cases. So all in all, we always emit one instruction
>>>>> at BPF program beginning that should have actually been avoided to
>>>>> spare this overhead.
>>>>>
>>>>> Signed-off-by: Daniel Borkmann <dborkman@...hat.com>
>>>>> Cc: Alexei Starovoitov <ast@...mgrid.com>
>>>>
>>>>
>>>> First 4 patches look great, but this one I have to disagree.
>>>> See below.
>>>>
>>>>> ---
>>>>>    net/core/filter.c | 10 +---------
>>>>>    1 file changed, 1 insertion(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>> index eada3d5..6fed231 100644
>>>>> --- a/net/core/filter.c
>>>>> +++ b/net/core/filter.c
>>>>> @@ -62,7 +62,6 @@
>>>>>    #define A      regs[insn->a_reg]
>>>>>    #define X      regs[insn->x_reg]
>>>>>    #define FP     regs[BPF_REG_FP]
>>>>> -#define ARG1   regs[BPF_REG_ARG1]
>>>>>    #define CTX    regs[BPF_REG_CTX]
>>>>>    #define K      insn->imm
>>>>>
>>>>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const
>>>>> struct
>>>>> sock_filter_int *insn)
>>>>>    #define CONT_JMP ({ insn++; goto select_insn; })
>>>>>
>>>>>           FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>>>>> -       ARG1 = (u64) (unsigned long) ctx;
>>>>> +       CTX = (u64) (unsigned long) ctx;
>>>>
>>>>
>>>>
>>>> R1 (ARG1) is the register that used to pass first argument to the
>>>> function.
>>>
>>>
>>> Yes that's clear. Which is why f.e. in convert_bpf_extensions() we
>>> currently
>>> copy ctx over to arg1 for calls, i.e.:
>>>
>>>    /* arg1 = ctx */
>>>
>>>    insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
>>>    insn->a_reg = ARG1_REG;
>>>    insn->x_reg = CTX_REG;
>>>    insn++;
>>>
>>>> For seamless kernel->bpf->kernel transition we have to follow calling
>>>> convention, so 'void *ctx' has to go into R1 by design.
>>>> Storing it into R6 (CTX) will only work for classic filters converted
>>>> to extended.
>>>> all native ebpf filters will be broken.
>>>
>>>
>>> My objection was that currently, we do _not_ have _any_ users or even
>>> kernel
>>> API for _native_ filters, at least not in mainline tree. The _main_ users
>>> we
>>> have are currently _all_ being converted, hence this patch. Given that
>>> these
>>> calls have likely just a minority of use cases triggered by tcpdump et
>>> al,
>>> the majority of users still need to do this overhead/additional work.
>>>
>>>> In documentation we say:
>>>>       * R1 - R5   - arguments from BPF program to in-kernel function
>>>> so llvm/gcc are following this ebpf ABI.
>>>> Calling convention is the same whether to call kernel function from ebpf
>>>> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.
>>>
>>>
>>> Yes, that's clear and convert_bpf_extensions() is doing that. So far
>>> we're
>>> not using llvm/gcc backend here and have the internal instruction set not
>>> exposed to user space, but even there you would need to prepare R1 - R5
>>> to
>>> hand-over arguments for the BPF_CALL insns, so why can't we load CTX into
>>> R1
>>> at that time just as we do with convert_bpf_extensions()?
>>
>>
>> How about then removing extra generated R6=R1 from converted and do
>> both in __sk_run_filter() ?
>>          regs[ARG1_REG] = (u64) (unsigned long) ctx;
>>          regs[CTX_REG] = (u64) (unsigned long) ctx;
>>
>> Overhead problem will be solved and ABI is still clean.
>
>
> Well, I just don't understand the concerns of ABI here. Given that we do not
> have any native BPF code to maintain in the kernel tree and given that we
> currently do not expose the instruction set to user space, we're free to do
> what we want, no? Eventually what's in mainline kernel is that dictates an
> ABI, so far we have it only in kernel space and the only current user of the
> ABI is the conversion function from user BPF programs. Given that helper

well, all the patches for llvm and the rest were posted.
Obviously they cannot go in at once, but breaking this ABI right now
makes it impossible to continue working on them.
I want to make ebpf engine to be useful for tracing filters and so on.
Are you saying forget it, this is classic bpf engine only?

> function calls may happen less often than executing instructions from the
> rest of the code, why can't your llvm/gcc backend emit the load of ctx into
> arg1? JITs don't need to treat that differently in any way, imho. Simply

I don't think we're on the same page here. compiler at the end is just
a piece of sw and can do everything, but imbalanced ebpf ABI is really
out of normal compiler work flow.
Pretty much compiler somehow need to generate one way of passing
arguments into a function, but inside the function it needs to expect
them in different registers?! That is practically impossible to do
in normal gcc/llvm.
Even without any compiler insight.
If you insist on ctx to be in R6 only, then please kill the whole
concept of ABI in the doc. It doesn't make sense to pass a value
into a function in R1, but the function will see in R6?!

> the one who is generating the BPF insns needs to emit ctx into arg1 just as
> he prepares the other argX registers. Btw, since we know a priori that
> __skb_get_pay_offset() and __get_raw_cpu_id() are not making use of all
> prepared registers, we could even go that far and avoid preparing loads for
> the two.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html