netdev - Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for ARG1/CTX handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5358453E.4050005@redhat.com>
Date:	Thu, 24 Apr 2014 00:57:02 +0200
From:	Daniel Borkmann <dborkman@...hat.com>
To:	Alexei Starovoitov <ast@...mgrid.com>
CC:	"David S. Miller" <davem@...emloft.net>,
	Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 5/5] net: filter: optimize BPF migration for
 ARG1/CTX handling

On 04/24/2014 12:37 AM, Alexei Starovoitov wrote:
> On Wed, Apr 23, 2014 at 3:21 PM, Daniel Borkmann <dborkman@...hat.com> wrote:
>> On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
>>> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@...hat.com>
>>> wrote:
>>>>
>>>> Currently, at initial setup in __sk_run_filter() we initialize the
>>>> BPF stack's frame-pointer and CTX register. However, instead of the
>>>> CTX register, we initialize context to ARG1, and during user filter
>>>> migration we emit *always* an instruction that copies the content
>>>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>>>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>>>> to ARG1 in these cases. So all in all, we always emit one instruction
>>>> at BPF program beginning that should have actually been avoided to
>>>> spare this overhead.
>>>>
>>>> Signed-off-by: Daniel Borkmann <dborkman@...hat.com>
>>>> Cc: Alexei Starovoitov <ast@...mgrid.com>
>>>
>>> First 4 patches look great, but this one I have to disagree.
>>> See below.
>>>
>>>> ---
>>>>    net/core/filter.c | 10 +---------
>>>>    1 file changed, 1 insertion(+), 9 deletions(-)
>>>>
>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>> index eada3d5..6fed231 100644
>>>> --- a/net/core/filter.c
>>>> +++ b/net/core/filter.c
>>>> @@ -62,7 +62,6 @@
>>>>    #define A      regs[insn->a_reg]
>>>>    #define X      regs[insn->x_reg]
>>>>    #define FP     regs[BPF_REG_FP]
>>>> -#define ARG1   regs[BPF_REG_ARG1]
>>>>    #define CTX    regs[BPF_REG_CTX]
>>>>    #define K      insn->imm
>>>>
>>>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct
>>>> sock_filter_int *insn)
>>>>    #define CONT_JMP ({ insn++; goto select_insn; })
>>>>
>>>>           FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>>>> -       ARG1 = (u64) (unsigned long) ctx;
>>>> +       CTX = (u64) (unsigned long) ctx;
>>>
>>>
>>> R1 (ARG1) is the register that used to pass first argument to the
>>> function.
>>
>> Yes that's clear. Which is why f.e. in convert_bpf_extensions() we currently
>> copy ctx over to arg1 for calls, i.e.:
>>
>>    /* arg1 = ctx */
>>
>>    insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
>>    insn->a_reg = ARG1_REG;
>>    insn->x_reg = CTX_REG;
>>    insn++;
>>
>>> For seamless kernel->bpf->kernel transition we have to follow calling
>>> convention, so 'void *ctx' has to go into R1 by design.
>>> Storing it into R6 (CTX) will only work for classic filters converted
>>> to extended.
>>> all native ebpf filters will be broken.
>>
>> My objection was that currently, we do _not_ have _any_ users or even kernel
>> API for _native_ filters, at least not in mainline tree. The _main_ users we
>> have are currently _all_ being converted, hence this patch. Given that these
>> calls have likely just a minority of use cases triggered by tcpdump et al,
>> the majority of users still need to do this overhead/additional work.
>>
>>> In documentation we say:
>>>       * R1 - R5   - arguments from BPF program to in-kernel function
>>> so llvm/gcc are following this ebpf ABI.
>>> Calling convention is the same whether to call kernel function from ebpf
>>> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.
>>
>> Yes, that's clear and convert_bpf_extensions() is doing that. So far we're
>> not using llvm/gcc backend here and have the internal instruction set not
>> exposed to user space, but even there you would need to prepare R1 - R5 to
>> hand-over arguments for the BPF_CALL insns, so why can't we load CTX into R1
>> at that time just as we do with convert_bpf_extensions()?
>
> How about then removing extra generated R6=R1 from converted and do
> both in __sk_run_filter() ?
>          regs[ARG1_REG] = (u64) (unsigned long) ctx;
>          regs[CTX_REG] = (u64) (unsigned long) ctx;
>
> Overhead problem will be solved and ABI is still clean.

Well, I just don't understand the concerns of ABI here. Given that we do not
have any native BPF code to maintain in the kernel tree and given that we
currently do not expose the instruction set to user space, we're free to do
what we want, no? Eventually what's in mainline kernel is that dictates an
ABI, so far we have it only in kernel space and the only current user of the
ABI is the conversion function from user BPF programs. Given that helper
function calls may happen less often than executing instructions from the
rest of the code, why can't your llvm/gcc backend emit the load of ctx into
arg1? JITs don't need to treat that differently in any way, imho. Simply
the one who is generating the BPF insns needs to emit ctx into arg1 just as
he prepares the other argX registers. Btw, since we know a priori that
__skb_get_pay_offset() and __get_raw_cpu_id() are not making use of all
prepared registers, we could even go that far and avoid preparing loads for
the two.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html