netdev - Re: [PATCH bpf-next] bpf: enable btf for use in all maps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 10 Aug 2018 15:30:29 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Martin KaFai Lau <kafai@...com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>, ast@...nel.org,
        netdev@...r.kernel.org, yhs@...com
Subject: Re: [PATCH bpf-next] bpf: enable btf for use in all maps

On 08/10/2018 02:54 PM, Martin KaFai Lau wrote:
> On Fri, Aug 10, 2018 at 09:55:35AM +0200, Daniel Borkmann wrote:
>> On 08/10/2018 04:13 AM, Alexei Starovoitov wrote:
>>> On Fri, Aug 10, 2018 at 12:43:20AM +0200, Daniel Borkmann wrote:
>>>> On 08/09/2018 11:44 PM, Alexei Starovoitov wrote:
>>>>> On Thu, Aug 09, 2018 at 11:30:52PM +0200, Daniel Borkmann wrote:
>>>>>> On 08/09/2018 11:14 PM, Alexei Starovoitov wrote:
>>>>>>> On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote:
>>>>>>>> Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
>>>>>>>> the basic arraymap") enabled support for BTF and dumping via
>>>>>>>> BPF fs for arraymap. However, both can be decoupled from each
>>>>>>>> other such that all BPF maps can be supported for attaching
>>>>>>>> BTF key/value information, while not all maps necessarily
>>>>>>>> need to dump via map_seq_show_elem() callback.
>>>>>>>>
>>>>>>>> The check in array_map_check_btf() can be generalized as
>>>>>>>> ultimatively the key and value size is the only contraint
>>>>>>>> that needs to match for the map. The fact that the key needs
>>>>>>>> to be of type int is optional; it could be any data type as
>>>>>>>> long as it matches the 4 byte key size, just like hash table
>>>>>>>> key or others could be of any data type as well.
>>>>>>>>
>>>>>>>> Minimal example of a hash table dump which then works out
>>>>>>>> of the box for bpftool:
>>>>>>>>
>>>>>>>>   # bpftool map dump id 19
>>>>>>>>   [{
>>>>>>>>           "key": {
>>>>>>>>               "": {
>>>>>>>>                   "vip": 0,
>>>>>>>>                   "vipv6": []
>>>>>>>>               },
>>>>>>>>               "port": 0,
>>>>>>>>               "family": 0,
>>>>>>>>               "proto": 0
>>>>>>>>           },
>>>>>>>>           "value": {
>>>>>>>>               "flags": 0,
>>>>>>>>               "vip_num": 0
>>>>>>>>           }
>>>>>>>>       }
>>>>>>>>   ]
>>>>>>>>
>>>>>>>> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
>>>>>>>> Cc: Yonghong Song <yhs@...com>
>>>>>>>> ---
>>>>>>>>  include/linux/bpf.h   |  4 +---
>>>>>>>>  kernel/bpf/arraymap.c | 27 ---------------------------
>>>>>>>>  kernel/bpf/inode.c    |  3 ++-
>>>>>>>>  kernel/bpf/syscall.c  | 24 ++++++++++++++++++++----
>>>>>>>>  4 files changed, 23 insertions(+), 35 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>>>>>> index cd8790d..eb76e8e 100644
>>>>>>>> --- a/include/linux/bpf.h
>>>>>>>> +++ b/include/linux/bpf.h
>>>>>>>> @@ -48,8 +48,6 @@ struct bpf_map_ops {
>>>>>>>>  	u32 (*map_fd_sys_lookup_elem)(void *ptr);
>>>>>>>>  	void (*map_seq_show_elem)(struct bpf_map *map, void *key,
>>>>>>>>  				  struct seq_file *m);
>>>>>>>> -	int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
>>>>>>>> -			     u32 key_type_id, u32 value_type_id);
>>>>>>>>  };
>>>>>>>>  
>>>>>>>>  struct bpf_map {
>>>>>>>> @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const struct bpf_map *map)
>>>>>>>>  
>>>>>>>>  static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
>>>>>>>>  {
>>>>>>>> -	return map->ops->map_seq_show_elem && map->ops->map_check_btf;
>>>>>>>> +	return map->btf && map->ops->map_seq_show_elem;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  extern const struct bpf_map_ops bpf_map_offload_ops;
>>>>>>>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
>>>>>>>> index 2aa55d030..67f0bdf 100644
>>>>>>>> --- a/kernel/bpf/arraymap.c
>>>>>>>> +++ b/kernel/bpf/arraymap.c
>>>>>>>> @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map *map, void *key,
>>>>>>>>  	rcu_read_unlock();
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> -static int array_map_check_btf(const struct bpf_map *map, const struct btf *btf,
>>>>>>>> -			       u32 btf_key_id, u32 btf_value_id)
>>>>>>>> -{
>>>>>>>> -	const struct btf_type *key_type, *value_type;
>>>>>>>> -	u32 key_size, value_size;
>>>>>>>> -	u32 int_data;
>>>>>>>> -
>>>>>>>> -	key_type = btf_type_id_size(btf, &btf_key_id, &key_size);
>>>>>>>> -	if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
>>>>>>>> -		return -EINVAL;
>>>>>>>> -
>>>>>>>> -	int_data = *(u32 *)(key_type + 1);
>>>>>>>> -	/* bpf array can only take a u32 key.  This check makes
>>>>>>>> -	 * sure that the btf matches the attr used during map_create.
>>>>>>>> -	 */
>>>>>>>> -	if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
>>>>>>>> -	    BTF_INT_OFFSET(int_data))
>>>>>>>> -		return -EINVAL;
>>>>>>>
>>>>>>> I think most of these checks are still necessary for array type.
>>>>>>> Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM
>>>>>>> is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't makes sense.
>>>>>>
>>>>>> Hmm, so on 64 bit archs BTF_KIND_PTR would get rejected for array,
>>>>>> on 32 bit it may be allowed due to sizeof(void *) == 4. BTF_KIND_ARRAY
>>>>>> could be array of u8 foo[4], for example, or u16 foo[2]. But how would
>>>>>> it ultimately be different from e.g. having 'struct a' versus 'struct b'
>>>>>> where both are of same size and while actual key has 'struct a', the one
>>>>>> who writes the prog resp. loads the BTF into the kernel would lie about
>>>>>> it stating it's of type 'struct b' instead? It's basically trusting the
>>>>>> app that it advertised sane key types which kernel is propagating back.
>>>>>
>>>>> for hash map - yes. the kernel cannot yet catch the lie that
>>>>> key == 'struct a' that user said in BTF is not what program used
>>>>> (which used 'struct b' of the same size).
>>>>> Eventually we will annotate all load/store in the program and will
>>>>> make sure that memory access match what BTF said.
>>>>
>>>> But in that case, would you reject the program? E.g. from prog point of
>>>> view, it's just a buffer of x bytes, so key could be casted to different
>>>> struct/types potentially and used for lookup; similar with value if you
>>>> would go the route to annotate all access into it. I don't think this
>>>> serves as a security feature (since you might as well just load the prog
>>>> without BTF just fine), but it can be used to help verifier to perform
>>>> rewrites like in tracing for implicit bpf_probe_read() based on member
>>>> access. But also in that case, if you might have e.g. stale or wrong BTF
>>>> data e.g. of the whole kernel or some application it would follow/walk
>>>> that one instead. But such user error would be "acceptable" since it serves
>>>> as a hint, roughly similar to (explicitly) walking the data structures
>>>> based on the headers today, you do have better control in terms of header
>>>> mismatches in that you can ship the BTF directly from the build, but there's
>>>> still no guarantee in that sense that you "verified" that these bytes
>>>> originally were mapped to struct foo somewhere in a C program.
>>>
>>> I wouldn't view such key checks as safety feature, but rather
>>> as honesty check. Meaning that user space shouldn't be able to cheat
>>> by passing completely bogus BTF into the kernel that only
>>> statisfies single size check.
>>
>> Ok, meaning, we agree that BTF is passed as a hint to the kernel. It is
>> a 'should not cheat' requirement but not 'cannot cheat', meaning if it
>> was the latter, BTF is core part of the verifier's safety analysis /
>> checks, but as it's optional on top of it (well, due to compatibility it
>> kind of needs to be anyway), it doesn't affect kernel's safety in
>> relation to what the program is doing internally.
>>
>>> Say, BTF has a key that is a 4 byte struct { char a,b,c,d; };
>>> but program operates with it as u32. I very much would like
>>> the verifier to notice and reject such program, since if C code
>>> has struct { char a,b,c,d; }; the compiler won't generate u32 access
>>> to it unless C code type casts, but then llvm will warn. So for u32
>>> to legitimately appear in the generated code the struct should be:
>>> union {
>>>   struct { char a,b,c,d;}
>>>   u32 e;
>>> };
>>> Narrow access (like u8 load/store in the bpf prog form u32 BTF field)
>>> is ok, since that's normal compiler optimization, but any other
>>> field/size mismatch the verifier should reject to prevent cheating.
>>>
>>> In other words even proprietary bpf programs should not be able to cheat.
>>> If they provide BTF to the kernel, it should correspond exactly to the program.
>>> That's the key to _trusted_ introspection.
>>
>> But then wouldn't this artificially force users into programming /
>> thinking style that verifier dictates upon them similar as we have
>> today as opposed to further moving away from it to allow more C-style
>> programs to be accepted?
>>
>> But even if that is the goal, it is still used as a hint today, e.g.
>> even for an array I could tell the compiler that the key is '__u32'
>> for BTF sake while using the underlying key differently (e.g. as struct,
>> enum, etc) in order to pass the array_map_check_btf() check in the
>> kernel. Those type of programs would still need to be accepted due
>> to compatibility reasons. Same if we only test on size match in other
>> maps, it's nothing different. I don't see how verifier would start
>> enforcing programs to be rejected based on access patterns on the
>> types; while it might work for newly added program types that this
>> could be enforced, it cannot for all the many existing program types,
>> but I also don't really think it needs to be. Unless it is declared
>> a safety feature for future program types.
>>
>>> Admin that ssh-es into the box and operates with bpftool should be
>>> certain that the map introspection represent the real situation of
>>> the program. If proprietary prog is paranoid about layout of map
>>> it shouldn't be using BTF, but if it does, BTF should match.
>>> In the future I'd like to enfore availability of BTF for
>>> new program types.
>>
>> Sure, and in vast majority / normal cases it will represent the real
>> situation. Do you mistrust DWARF debugging data that gets shipped with
>> your distro, for example? pahole and other tools on top of it? It is
>> pretty much the same situation. Given gdb example below, does gdb do
>> any verification of the binary in order to analyze load/store patterns
>> and whether it matches with what sits in DWARF as types? It would just
>> as well map the types into memory and dump it as pretty print instead.
>> And that's okay, the one thing that needs to be verified for correctness
>> resp. validity is the debugging format itself so it can be used for
>> that.
> I think decoupling map_seq_show_elem() requirement from BTF support
> during map_create() is ok.  There is plan to do that going forward if
> it ever hits some map types that is difficult to do map_seq_show_elem().
> However, I think most of the maps should have pretty straight
> forward seq_show implementation considering most of the
> heavy lifting has already been done in btf_*_seq_show()
> and bpffs_map_seq_ops.

I think potentially this could also be made generic for majority of maps
in that you do the lookup and then the key/value btf_type_seq_show()
dump from it under RCU.

> Regarding one general map_check_btf() for all maps' purpose,
> I think not all bpf maps are behaving the same.
> 
> There are some property that the kernel implies on a particular
> map type.  Some of them even have kernel generated value in it.
> For example, map-in-map, the lookup_elem() value is a map's id
> which must be an integer generated by kernel's idr_alloc_cyclic().
> Treating it as enum/struct/array...etc seems wrong.
> Hence, I think each map type should be able to provide
> its own map_check_btf().  If a map does not have
> specific key/value requirement, it can use the default one which
> is the min requirement for kernel safety reason.

Yeah, fair point, I think potentially for the in-kernel generated ones at
some point they could provide their own in-kernel BTF description. Ok,
I'll rework to have a default and stricter map specific override if present
which those in-kernel ones will reject then.

Thanks,
Daniel