netdev - Re: [RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function calls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 09 Feb 2018 22:24:04 +0530
From:   "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>
To:     Alexei Starovoitov <ast@...com>, daniel@...earbox.net,
        Sandipan Das <sandipan@...ux.vnet.ibm.com>
Cc:     linuxppc-dev@...ts.ozlabs.org, mpe@...erman.id.au,
        netdev@...r.kernel.org
Subject: Re: [RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function
 calls

Naveen N. Rao wrote:
> Alexei Starovoitov wrote:
>> On 2/8/18 4:03 AM, Sandipan Das wrote:
>>> The imm field of a bpf_insn is a signed 32-bit integer. For
>>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>>> __bpf_call_base to the start of the callee function.
>>>
>>> For some architectures, such as powerpc64, it was found that
>>> this offset may be as large as 64 bits because of which this
>>> cannot be accomodated in the imm field without truncation.
>>>
>>> To resolve this, we additionally use the aux data within each
>>> bpf_prog associated with the caller functions to store the
>>> addresses of their respective callees.
>>>
>>> Signed-off-by: Sandipan Das <sandipan@...ux.vnet.ibm.com>
>>> ---
>>>  kernel/bpf/verifier.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 38 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 5fb69a85d967..52088b4ca02f 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -5282,6 +5282,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>>>  	 * run last pass of JIT
>>>  	 */
>>>  	for (i = 0; i <= env->subprog_cnt; i++) {
>>> +		u32 flen = func[i]->len, callee_cnt = 0;
>>> +		struct bpf_prog **callee;
>>> +
>>> +		/* for now assume that the maximum number of bpf function
>>> +		 * calls that can be made by a caller must be at most the
>>> +		 * number of bpf instructions in that function
>>> +		 */
>>> +		callee = kzalloc(sizeof(func[i]) * flen, GFP_KERNEL);
>>> +		if (!callee) {
>>> +			err = -ENOMEM;
>>> +			goto out_free;
>>> +		}
>>> +
>>>  		insn = func[i]->insnsi;
>>>  		for (j = 0; j < func[i]->len; j++, insn++) {
>>>  			if (insn->code != (BPF_JMP | BPF_CALL) ||
>>> @@ -5292,6 +5305,26 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>>>  			insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>>>  				func[subprog]->bpf_func -
>>>  				__bpf_call_base;
>>> +
>>> +			/* the offset to the callee from __bpf_call_base
>>> +			 * may be larger than what the 32 bit integer imm
>>> +			 * can accomodate which will truncate the higher
>>> +			 * order bits
>>> +			 *
>>> +			 * to avoid this, we additionally utilize the aux
>>> +			 * data of each caller function for storing the
>>> +			 * addresses of every callee associated with it
>>> +			 */
>>> +			callee[callee_cnt++] = func[subprog];
>> 
>> can you share typical /proc/kallsyms ?
>> Are you saying that kernel and kernel modules are allocated from
>> address spaces that are always more than 32-bit apart?
> 
> Yes. On ppc64, kernel text is linearly mapped from 0xc000000000000000, 
> while vmalloc'ed area starts from 0xd000000000000000 (for radix, this is
> different, but still beyond a 32-bit offset).
> 
>> That would mean that all kernel calls into modules are far calls
>> and the other way around form .ko into kernel?
>> Performance is probably suffering because every call needs to be built
>> with full 64-bit offset. No ?
> 
> Possibly, and I think Michael can give a better perspective, but I think
> this is due to our ABI. For inter-module calls, we need to setup the TOC
> pointer (or the address of the function being called with ABIv2), which 
> would require us to load a full address regardless.

Thinking more about this, as an optimization, for bpf-to-bpf calls, we 
could detect a near call and just emit a relative branch since we don't 
care about TOC with BPF. But, this will depend on whether the different 
BPF functions are close enough (within 32MB) of one another.

We can attempt that once the generic changes are finalized on.

Thanks,
Naveen