netdev - Re: [PATCH bpf-next 13/15] libbpf: Generate loader program out of BPF ELF file.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ff1a3d4-0aba-e678-d04c-621ab18b7dd0@fb.com>
Date:   Tue, 20 Apr 2021 22:30:21 -0700
From:   Yonghong Song <yhs@...com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
CC:     <davem@...emloft.net>, <daniel@...earbox.net>, <andrii@...nel.org>,
        <netdev@...r.kernel.org>, <bpf@...r.kernel.org>,
        <kernel-team@...com>
Subject: Re: [PATCH bpf-next 13/15] libbpf: Generate loader program out of BPF
 ELF file.



On 4/20/21 9:46 PM, Alexei Starovoitov wrote:
> On Tue, Apr 20, 2021 at 06:34:11PM -0700, Yonghong Song wrote:
>>>
>>> kconfig, typeless ksym, struct_ops and CO-RE are not supported yet.
>>
>> Beyond this, currently libbpf has a lot of flexibility between prog open
>> and load, change program type, key/value size, pin maps, max_entries, reuse
>> map, etc. it is worthwhile to mention this in the cover letter.
>> It is possible that these changes may defeat the purpose of signing the
>> program though.
> 
> Right. We'd need to decide which ones are ok to change after signature
> verification. I think max_entries gotta be allowed, since tools
> actively change it. The other fields selftest change too, but I'm not sure
> it's a good thing to allow for signed progs. TBD.
> 
>>> +	if (gen->error)
>>> +		return -ENOMEM;
>>
>> return gen->error?
> 
> right
> 
>>> +	if (off + size > UINT32_MAX) {
>>> +		gen->error = -ERANGE;
>>> +		return -ERANGE;
>>> +	}
>>> +	gen->insn_start = realloc(gen->insn_start, off + size);
>>> +	if (!gen->insn_start) {
>>> +		gen->error = -ENOMEM;
>>> +		return -ENOMEM;
>>> +	}
>>> +	gen->insn_cur = gen->insn_start + off;
>>> +	return 0;
>>> +}
>>> +
>>> +static int bpf_gen__realloc_data_buf(struct bpf_gen *gen, __u32 size)
>>
>> Maybe change the return type to size_t? Esp. in the below
>> we have off + size > UINT32_MAX.
> 
> return type? it's 0 or error. you mean argument type?
> I think u32 is better. The prog size and all other ways
> the bpf_gen__add_data is called with 32-bit values.

Sorry, I mean

+static int bpf_gen__add_data(struct bpf_gen *gen, const void *data, 
__u32 size)

Since we allow off + size could be close to UINT32_MAX,
maybe bpf_gen__add_data should return __u32 instead of int.

> 
>>> +{
>>> +	size_t off = gen->data_cur - gen->data_start;
>>> +
>>> +	if (gen->error)
>>> +		return -ENOMEM;
>>
>> return gen->error?
> 
> right
> 
>>> +	if (off + size > UINT32_MAX) {
>>> +		gen->error = -ERANGE;
>>> +		return -ERANGE;
>>> +	}
>>> +	gen->data_start = realloc(gen->data_start, off + size);
>>> +	if (!gen->data_start) {
>>> +		gen->error = -ENOMEM;
>>> +		return -ENOMEM;
>>> +	}
>>> +	gen->data_cur = gen->data_start + off;
>>> +	return 0;
>>> +}
>>> +
>>> +static void bpf_gen__emit(struct bpf_gen *gen, struct bpf_insn insn)
>>> +{
>>> +	if (bpf_gen__realloc_insn_buf(gen, sizeof(insn)))
>>> +		return;
>>> +	memcpy(gen->insn_cur, &insn, sizeof(insn));
>>> +	gen->insn_cur += sizeof(insn);
>>> +}
>>> +
>>> +static void bpf_gen__emit2(struct bpf_gen *gen, struct bpf_insn insn1, struct bpf_insn insn2)
>>> +{
>>> +	bpf_gen__emit(gen, insn1);
>>> +	bpf_gen__emit(gen, insn2);
>>> +}
>>> +
>>> +void bpf_gen__init(struct bpf_gen *gen, int log_level)
>>> +{
>>> +	gen->log_level = log_level;
>>> +	bpf_gen__emit(gen, BPF_MOV64_REG(BPF_REG_6, BPF_REG_1));
>>> +	bpf_gen__emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_10, stack_off(last_attach_btf_obj_fd), 0));
>>
>> Here we initialize last_attach_btf_obj_fd, do we need to initialize
>> last_btf_id?
> 
> Not sure why I inited it. Probably left over. I'll remove it.
> 
>>> +}
>>> +
>>> +static int bpf_gen__add_data(struct bpf_gen *gen, const void *data, __u32 size)
>>> +{
>>> +	void *prev;
>>> +
>>> +	if (bpf_gen__realloc_data_buf(gen, size))
>>> +		return 0;
>>> +	prev = gen->data_cur;
>>> +	memcpy(gen->data_cur, data, size);
>>> +	gen->data_cur += size;
>>> +	return prev - gen->data_start;
>>> +}
>>> +
>>> +static int insn_bytes_to_bpf_size(__u32 sz)
>>> +{
>>> +	switch (sz) {
>>> +	case 8: return BPF_DW;
>>> +	case 4: return BPF_W;
>>> +	case 2: return BPF_H;
>>> +	case 1: return BPF_B;
>>> +	default: return -1;
>>> +	}
>>> +}
>>> +
>> [...]
>>> +
>>> +static void __bpf_gen__debug(struct bpf_gen *gen, int reg1, int reg2, const char *fmt, va_list args)
>>> +{
>>> +	char buf[1024];
>>> +	int addr, len, ret;
>>> +
>>> +	if (!gen->log_level)
>>> +		return;
>>> +	ret = vsnprintf(buf, sizeof(buf), fmt, args);
>>> +	if (ret < 1024 - 7 && reg1 >= 0 && reg2 < 0)
>>> +		strcat(buf, " r=%d");
>>
>> Why only for reg1 >= 0 && reg2 < 0?
> 
> To avoid specifying BPF_REG_7 and adding " r=%%d" to printks explicitly.
> Just to make bpf_gen__debug_ret() short and less verbose.
> 
>>> +	len = strlen(buf) + 1;
>>> +	addr = bpf_gen__add_data(gen, buf, len);
>>> +
>>> +	bpf_gen__emit2(gen, BPF_LD_IMM64_RAW_FULL(BPF_REG_1, BPF_PSEUDO_MAP_IDX_VALUE, 0, 0, 0, addr));
>>> +	bpf_gen__emit(gen, BPF_MOV64_IMM(BPF_REG_2, len));
>>> +	if (reg1 >= 0)
>>> +		bpf_gen__emit(gen, BPF_MOV64_REG(BPF_REG_3, reg1));
>>> +	if (reg2 >= 0)
>>> +		bpf_gen__emit(gen, BPF_MOV64_REG(BPF_REG_4, reg2));
>>> +	bpf_gen__emit(gen, BPF_EMIT_CALL(BPF_FUNC_trace_printk));
>>> +}
>>> +
>>> +static void bpf_gen__debug_regs(struct bpf_gen *gen, int reg1, int reg2, const char *fmt, ...)
>>> +{
>>> +	va_list args;
>>> +
>>> +	va_start(args, fmt);
>>> +	__bpf_gen__debug(gen, reg1, reg2, fmt, args);
>>> +	va_end(args);
>>> +}
>>> +
>>> +static void bpf_gen__debug_ret(struct bpf_gen *gen, const char *fmt, ...)
>>> +{
>>> +	va_list args;
>>> +
>>> +	va_start(args, fmt);
>>> +	__bpf_gen__debug(gen, BPF_REG_7, -1, fmt, args);
>>> +	va_end(args);
>>> +}
>>> +
>>> +static void bpf_gen__emit_sys_close(struct bpf_gen *gen, int stack_off)
>>> +{
>>> +	bpf_gen__emit(gen, BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_10, stack_off));
>>> +	bpf_gen__emit(gen, BPF_JMP_IMM(BPF_JSLE, BPF_REG_1, 0, 2 + (gen->log_level ? 6 : 0)));
>>
>> The number "6" is magic. This refers the number of insns generated below
>> with
>>     bpf_gen__debug_regs(gen, BPF_REG_9, BPF_REG_0, "close(%%d) = %%d");
>> At least some comment will be better.
> 
> good point. will add a comment.
> 
>>
>>> +	bpf_gen__emit(gen, BPF_MOV64_REG(BPF_REG_9, BPF_REG_1));
>>> +	bpf_gen__emit(gen, BPF_EMIT_CALL(BPF_FUNC_sys_close));
>>> +	bpf_gen__debug_regs(gen, BPF_REG_9, BPF_REG_0, "close(%%d) = %%d");
>>> +}
>>> +
>>> +int bpf_gen__finish(struct bpf_gen *gen)
>>> +{
>>> +	int i;
>>> +
>>> +	bpf_gen__emit_sys_close(gen, stack_off(btf_fd));
>>> +	for (i = 0; i < gen->nr_progs; i++)
>>> +		bpf_gen__move_stack2ctx(gen, offsetof(struct bpf_loader_ctx,
>>> +						      u[gen->nr_maps + i].map_fd), 4,
>>
>> Maybe u[gen->nr_maps + i].prog_fd?
>> u[..] is a union, but prog_fd better reflects what it is.
> 
> ohh. right.
> 
>>> +					stack_off(prog_fd[i]));
>>> +	for (i = 0; i < gen->nr_maps; i++)
>>> +		bpf_gen__move_stack2ctx(gen, offsetof(struct bpf_loader_ctx,
>>> +						      u[i].prog_fd), 4,
>>
>> u[i].map_fd?
> 
> right.
> 
>>> +	/* remember map_fd in the stack, if successful */
>>> +	if (map_idx < 0) {
>>> +		bpf_gen__emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_7, stack_off(inner_map_fd)));
>>
>> Some comments here to indicate map_idx < 0 is for inner map creation will
>> help understand the code.
> 
> will do.
> 
>>> +	/* store btf_id into insn[insn_idx].imm */
>>> +	insn = (int)(long)&((struct bpf_insn *)(long)insns)[relo->insn_idx].imm;
>>
>> This is really fancy. Maybe something like
>> 	insn = insns + sizeof(struct bpf_insn) * relo->insn_idx + offsetof(struct
>> bpf_insn, imm).
>> Does this sound better?
> 
> yeah. much better.
> 
>>> +static void mark_feat_supported(enum kern_feature_id last_feat)
>>> +{
>>> +	struct kern_feature_desc *feat;
>>> +	int i;
>>> +
>>> +	for (i = 0; i <= last_feat; i++) {
>>> +		feat = &feature_probes[i];
>>> +		WRITE_ONCE(feat->res, FEAT_SUPPORTED);
>>> +	}
>>
>> This assumes all earlier features than FD_IDX are supported. I think this is
>> probably fine although it may not work for some weird backport.
>> Did you see any issues if we don't explicitly set previous features
>> supported?
> 
> This helper is only used as mark_feat_supported(FEAT_FD_IDX)
> to tell libbpf that it shouldn't probe anything.
> Otherwise probing via prog_load screw up gen_trace completely.
> May be it will be mark_all_feat_supported(void), but that seems less flexible.

Maybe add some comments here to explain why marking explicit supported
instead if probing?

> 
>>> @@ -9383,7 +9512,13 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, int *btf_obj_fd,
>>>    	}
>>>    	/* kernel/module BTF ID */
>>> -	err = find_kernel_btf_id(prog->obj, attach_name, attach_type, btf_obj_fd, btf_type_id);
>>> +	if (prog->obj->gen_trace) {
>>> +		bpf_gen__record_find_name(prog->obj->gen_trace, attach_name, attach_type);
>>> +		*btf_obj_fd = 0;
>>> +		*btf_type_id = 1;
>>
>> We have quite some codes like this and may add more to support more
>> features. I am wondering whether we could have some kind of callbacks
>> to make the code more streamlined. But I am not sure how easy it is.
> 
> you mean find_kernel_btf_id() in general?
> This 'find' operation is translated differently for
> prog name as seen in this hunk via bpf_gen__record_find_name()
> and via bpf_gen__record_extern() in another place.
> For libbpf it's all find_kernel_btf_id(), but semantically they are different,
> so they cannot map as-is to gen trace bpf_gen__find_kernel_btf_id (if there was
> such thing).
> Because such 'generic' callback wouldn't convey the meaning of what to do
> with the result of the find.

I mean like calling
     err = obj->ops->find_kernel_btf_id(...)
where gen_trace and normal libbpf all registers their own callback 
functions for find_kernel_btf_id(). Similar ideas can be applied to
other places or not. Not 100% sure this is the best approach or not,
just want to bring it up for discussion.

> 
>>> +	} else {
>>> +		err = find_kernel_btf_id(prog->obj, attach_name, attach_type, btf_obj_fd, btf_type_id);
>>> +	}
>>>    	if (err) {
>>>    		pr_warn("failed to find kernel BTF type ID of '%s': %d\n", attach_name, err);
>>>    		return err;
>>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>>> index b9b29baf1df8..a5dffc0a3369 100644
>>> --- a/tools/lib/bpf/libbpf.map
>>> +++ b/tools/lib/bpf/libbpf.map
>>> @@ -361,4 +361,5 @@ LIBBPF_0.4.0 {
>>>    		bpf_linker__new;
>>>    		bpf_map__inner_map;
>>>    		bpf_object__set_kversion;
>>> +		bpf_load;
>>
>> Based on alphabet ordering, this should move a few places earlier.
>>
>> I will need to go through the patch again for better understanding ...
> 
> Thanks a lot for the review.
> 
> I'll address these comments and those that I got offline and will post v2.
> This gen stuff will look quite different.
> I hope bpf_load will not be a part of uapi anymore.
> And 'struct bpf_gen' will not be exposed to bpftool directly.

Sounds good.