netdev - Re: [RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function calls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5a552a19-5600-1fd4-57f2-6337127b75e9@fb.com>
Date:   Fri, 9 Feb 2018 16:38:11 -0800
From:   Alexei Starovoitov <ast@...com>
To:     "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
        <daniel@...earbox.net>, Sandipan Das <sandipan@...ux.vnet.ibm.com>
CC:     <linuxppc-dev@...ts.ozlabs.org>, <mpe@...erman.id.au>,
        <netdev@...r.kernel.org>
Subject: Re: [RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function
 calls

On 2/9/18 8:54 AM, Naveen N. Rao wrote:
> Naveen N. Rao wrote:
>> Alexei Starovoitov wrote:
>>> On 2/8/18 4:03 AM, Sandipan Das wrote:
>>>> The imm field of a bpf_insn is a signed 32-bit integer. For
>>>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>>>> __bpf_call_base to the start of the callee function.
>>>>
>>>> For some architectures, such as powerpc64, it was found that
>>>> this offset may be as large as 64 bits because of which this
>>>> cannot be accomodated in the imm field without truncation.
>>>>
>>>> To resolve this, we additionally use the aux data within each
>>>> bpf_prog associated with the caller functions to store the
>>>> addresses of their respective callees.
>>>>
>>>> Signed-off-by: Sandipan Das <sandipan@...ux.vnet.ibm.com>
>>>> ---
>>>>  kernel/bpf/verifier.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 38 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 5fb69a85d967..52088b4ca02f 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -5282,6 +5282,19 @@ static int jit_subprogs(struct
>>>> bpf_verifier_env *env)
>>>>       * run last pass of JIT
>>>>       */
>>>>      for (i = 0; i <= env->subprog_cnt; i++) {
>>>> +        u32 flen = func[i]->len, callee_cnt = 0;
>>>> +        struct bpf_prog **callee;
>>>> +
>>>> +        /* for now assume that the maximum number of bpf function
>>>> +         * calls that can be made by a caller must be at most the
>>>> +         * number of bpf instructions in that function
>>>> +         */
>>>> +        callee = kzalloc(sizeof(func[i]) * flen, GFP_KERNEL);
>>>> +        if (!callee) {
>>>> +            err = -ENOMEM;
>>>> +            goto out_free;
>>>> +        }
>>>> +
>>>>          insn = func[i]->insnsi;
>>>>          for (j = 0; j < func[i]->len; j++, insn++) {
>>>>              if (insn->code != (BPF_JMP | BPF_CALL) ||
>>>> @@ -5292,6 +5305,26 @@ static int jit_subprogs(struct
>>>> bpf_verifier_env *env)
>>>>              insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>>>>                  func[subprog]->bpf_func -
>>>>                  __bpf_call_base;
>>>> +
>>>> +            /* the offset to the callee from __bpf_call_base
>>>> +             * may be larger than what the 32 bit integer imm
>>>> +             * can accomodate which will truncate the higher
>>>> +             * order bits
>>>> +             *
>>>> +             * to avoid this, we additionally utilize the aux
>>>> +             * data of each caller function for storing the
>>>> +             * addresses of every callee associated with it
>>>> +             */
>>>> +            callee[callee_cnt++] = func[subprog];
>>>
>>> can you share typical /proc/kallsyms ?
>>> Are you saying that kernel and kernel modules are allocated from
>>> address spaces that are always more than 32-bit apart?
>>
>> Yes. On ppc64, kernel text is linearly mapped from 0xc000000000000000,
>> while vmalloc'ed area starts from 0xd000000000000000 (for radix, this is
>> different, but still beyond a 32-bit offset).
>>
>>> That would mean that all kernel calls into modules are far calls
>>> and the other way around form .ko into kernel?
>>> Performance is probably suffering because every call needs to be built
>>> with full 64-bit offset. No ?
>>
>> Possibly, and I think Michael can give a better perspective, but I think
>> this is due to our ABI. For inter-module calls, we need to setup the TOC
>> pointer (or the address of the function being called with ABIv2),
>> which would require us to load a full address regardless.
>
> Thinking more about this, as an optimization, for bpf-to-bpf calls, we
> could detect a near call and just emit a relative branch since we don't
> care about TOC with BPF. But, this will depend on whether the different
> BPF functions are close enough (within 32MB) of one another.

so that will be just an optimization. Understood.
How about instead of doing callee = kzalloc(sizeof(func[i]) * flen..
we keep  insn->off pointing to subprog and move
prog->aux->func = func;
before the last JIT pass.
Then you won't need to alloc this extra array.