[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c51e684b-5f86-a6e5-d31d-ed42179bf626@fb.com>
Date: Tue, 12 Nov 2019 04:47:52 +0000
From: Alexei Starovoitov <ast@...com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
CC: Alexei Starovoitov <ast@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Daniel Borkmann <daniel@...earbox.net>,
"x86@...nel.org" <x86@...nel.org>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH v3 bpf-next 15/18] bpf: Support attaching tracing BPF
program to other BPF programs
On 11/11/19 8:38 PM, Andrii Nakryiko wrote:
> On Mon, Nov 11, 2019 at 3:04 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
>>
>> On Sat, Nov 09, 2019 at 11:17:37PM -0800, Andrii Nakryiko wrote:
>>> On Thu, Nov 7, 2019 at 10:41 PM Alexei Starovoitov <ast@...nel.org> wrote:
>>>>
>>>> Allow FENTRY/FEXIT BPF programs to attach to other BPF programs of any type
>>>> including their subprograms. This feature allows snooping on input and output
>>>> packets in XDP, TC programs including their return values. In order to do that
>>>> the verifier needs to track types not only of vmlinux, but types of other BPF
>>>> programs as well. The verifier also needs to translate uapi/linux/bpf.h types
>>>> used by networking programs into kernel internal BTF types used by FENTRY/FEXIT
>>>> BPF programs. In some cases LLVM optimizations can remove arguments from BPF
>>>> subprograms without adjusting BTF info that LLVM backend knows. When BTF info
>>>> disagrees with actual types that the verifiers sees the BPF trampoline has to
>>>> fallback to conservative and treat all arguments as u64. The FENTRY/FEXIT
>>>> program can still attach to such subprograms, but won't be able to recognize
>>>> pointer types like 'struct sk_buff *' into won't be able to pass them to
>>>> bpf_skb_output() for dumping to user space.
>>>>
>>>> The BPF_PROG_LOAD command is extended with attach_prog_fd field. When it's set
>>>> to zero the attach_btf_id is one vmlinux BTF type ids. When attach_prog_fd
>>>> points to previously loaded BPF program the attach_btf_id is BTF type id of
>>>> main function or one of its subprograms.
>>>>
>>>> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
>>>> ---
>>>> arch/x86/net/bpf_jit_comp.c | 3 +-
>>>> include/linux/bpf.h | 2 +
>>>> include/linux/btf.h | 1 +
>>>> include/uapi/linux/bpf.h | 1 +
>>>> kernel/bpf/btf.c | 58 +++++++++++++++++++---
>>>> kernel/bpf/core.c | 2 +
>>>> kernel/bpf/syscall.c | 19 +++++--
>>>> kernel/bpf/verifier.c | 98 +++++++++++++++++++++++++++++--------
>>>> kernel/trace/bpf_trace.c | 2 -
>>>> 9 files changed, 151 insertions(+), 35 deletions(-)
>>>>
>>>
>>> [...]
>>>
>>>> +
>>>> +static bool btf_translate_to_vmlinux(struct bpf_verifier_log *log,
>>>> + struct btf *btf,
>>>> + const struct btf_type *t,
>>>> + struct bpf_insn_access_aux *info)
>>>> +{
>>>> + const char *tname = __btf_name_by_offset(btf, t->name_off);
>>>> + int btf_id;
>>>> +
>>>> + if (!tname) {
>>>> + bpf_log(log, "Program's type doesn't have a name\n");
>>>> + return false;
>>>> + }
>>>> + if (strcmp(tname, "__sk_buff") == 0) {
>>>
>>> might be a good idea to ensure that t's type is also a struct?
>>>
>>>> + btf_id = btf_resolve_helper_id(log, &bpf_skb_output_proto, 0);
>>>
>>> This is kind of ugly and high-maintenance. Have you considered having
>>> something like this, to do this mapping:
>>>
>>> struct bpf_ctx_mapping {
>>> struct sk_buff *__sk_buff;
>>> struct xdp_buff *xdp_md;
>>> };
>>>
>>> So field name is a name you are trying to match, while field type is
>>> actual type you are mapping to? You won't need to find special
>>> function protos (like bpf_skb_output_proto), it will be easy to
>>> extend, you'll have real vmlinux types automatically captured for you
>>> (you'll just have to initially find bpf_ctx_mapping's btf_id).
>>
>> I was thinking something along these lines.
>> The problem with single struct like above is that it's centralized.
>> convert_ctx_access callbacks are all over the place.
>> So I'm thinking to add macro like this to bpf.h
>> +#define BPF_RECORD_CTX_CONVERSION(user_type, kernel_type) \
>> + ({typedef kernel_type (*bpf_ctx_convert)(user_type); \
>> + (void) (bpf_ctx_convert) (void *) 0;})
>>
>> and then do
>> BPF_RECORD_CTX_CONVERSION(struct bpf_xdp_sock, struct xdp_sock);
>> inside convert_ctx_access functions (like bpf_xdp_sock_convert_ctx_access).
>> There will be several typedefs with 'bpf_ctx_convert' name. The
>> btf_translate_to_vmlinux() will iterate over them. Speed is not criticial here,
>
> I guess that works as well. Please leave a comment explaining the idea
> behind this distributed mapping :)
>
>> but long term we probably need to merge prog's BTF with vmlinux's BTF, so most
>> of the type comparison is done during prog load. It probably should reduce the
>> size of prog's BTF too. Renumbering of prog's BTF will be annoying though.
>> Something to consider long term.
>>
>>>
>>>> + if (btf_id < 0)
>>>> + return false;
>>>> + info->btf_id = btf_id;
>>>> + return true;
>>>> + }
>>>> + return false;
>>>> +}
>>>>
>>>
>>> [...]
>>>
>>>> + if (tgt_prog && conservative) {
>>>> + struct btf_func_model *m = &tr->func.model;
>>>> +
>>>> + /* BTF function prototype doesn't match the verifier types.
>>>> + * Fall back to 5 u64 args.
>>>> + */
>>>> + for (i = 0; i < 5; i++)
>>>> + m->arg_size[i] = 8;
>>>> + m->ret_size = 8;
>>>> + m->nr_args = 5;
>>>> + prog->aux->attach_func_proto = NULL;
>>>> + } else {
>>>> + ret = btf_distill_func_proto(&env->log, btf, t,
>>>> + tname, &tr->func.model);
>>>
>>> there is nothing preventing some parallel thread to modify
>>> tr->func.model in parallel, right? Should these modifications be
>>> either locked or at least WRITE_ONCE, similar to
>>> btf_resolve_helper_id?
>>
>> hmm. Right. There is a race with bpf_trampoline_lookup. One thread could have
>> just created the trampoline and still doing distill, while another thread is
>> trying to use it after getting it from bpf_trampoline_lookup. The fix choices
>> are not pretty. Either to add a mutex to check_attach_btf_id() or do
>> bpf_trampoline_lookup_or_create() with extra callback that does
>> btf_distill_func_proto while bpf_trampoline_lookup_or_create is holding
>> trampoline_mutex or move most of the check_attach_btf_id() logic into
>> bpf_trampoline_lookup_or_create().
>> I tried to keep trampoline as abstract concept, but with callback or move
>> the verifer and btf logic will bleed into trampoline. Hmm.
>
> yeah, that sounds too intrusive. I'd change btf_distill_func_proto to
> accept struct btf_func_model **m, allocate model dynamically, and then
> compare_exchange the final constructed model pointer.
cmpxchg is too ugly and also not covering all other fields that may need
to have serialized access in the future. I went with simpler model of
additional mutex per trampoline. It also helped to avoid global mutex
for link/unlink.
Powered by blists - more mailing lists