netdev - Re: [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c873deaf-6d63-98b1-a29a-01dd311be99a@linux.dev>
Date:   Tue, 31 Jan 2023 14:07:17 -0800
From:   Martin KaFai Lau <martin.lau@...ux.dev>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Joanne Koong <joannelkoong@...il.com>, daniel@...earbox.net,
        andrii@...nel.org, martin.lau@...nel.org, ast@...nel.org,
        netdev@...r.kernel.org, memxor@...il.com, kernel-team@...com,
        bpf@...r.kernel.org, Andrii Nakryiko <andrii.nakryiko@...il.com>
Subject: Re: [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs

On 1/30/23 9:30 PM, Alexei Starovoitov wrote:
>>>>> Also bpf_skb_pull_data is quite heavy. For progs that only want to parse
>>>>> the packet calling that in bpf_dynptr_data is a heavy hammer.
>>>>>
>>>>> It feels that we need to go back to skb_header_pointer-like discussion.
>>>>> Something like:
>>>>> bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len, void *buffer)
>>>>> Whether buffer is a part of dynptr or program provided is tbd.
>>>>
>>>> making it hidden within dynptr would make this approach unreliable
>>>> (memory allocations, which can fail, etc). But if we ask users to pass
>>>> it directly, then it should be relatively easy to use in practice with
>>>> some pre-allocated per-CPU buffer:
> 
> bpf_skb_pull_data() is even more unreliable, since it's a bigger allocation.
> I like preallocated approach more, so we're in agreement here.
> 
>>>>
>>>>
>>>> struct {
>>>>     __int(type, BPF_MAP_TYPE_PERCPU_ARRAY);
>>>>     __int(max_entries, 1);
>>>>     __type(key, int);
>>>>     __type(value, char[4096]);
>>>> } scratch SEC(".maps");
>>>>
>>>>
>>>> ...
>>>>
>>>>
>>>> struct dyn_ptr *dp = bpf_dynptr_from_skb(...).
>>>> void *p, *buf;
>>>> int zero = 0;
>>>>
>>>> buf = bpf_map_lookup_elem(&scratch, &zero);
>>>> if (!buf) return 0; /* can't happen */
>>>>
>>>> p = bpf_dynptr_slice(dp, off, 16, buf);
>>>> if (p == NULL) {
>>>>      /* out of range */
>>>> } else {
>>>>      /* work with p directly */
>>>> }
>>>>
>>>> /* if we wrote something to p and it was copied to buffer, write it back */
>>>> if (p == buf) {
>>>>       bpf_dynptr_write(dp, buf, 16);
>>>> }
>>>>
>>>>
>>>> We'll just need to teach verifier to make sure that buf is at least 16
>>>> byte long.
>>>
>>> A fifth __sz arg may do:
>>> bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len, void
>>> *buffer, u32 buffer__sz);
>>
>> We'll need to make sure that buffer__sz is >= len (or preferably not
>> require extra size at all). We can check that at runtime, of course,
>> but rejecting too small buffer at verification time would be a better
>> experience.
> 
> I don't follow. Why two equivalent 'len' args ?
> Just to allow 'len' to be a variable instead of constant ?
> It's unusual for the verifier to have 'len' before 'buffer',
> but this is fixable.

Agree. One const scalar 'len' should be enough. Buffer should have the same size 
as the requesting slice.

> 
> How about adding 'rd_only vs rdwr' flag ?
> Then MEM_RDONLY for ret value of bpf_dynptr_slice can be set by the verifier
> and in run-time bpf_dynptr_slice() wouldn't need to check for skb->cloned.
> if (rd_only) return skb_header_pointer()
> if (rdwr) bpf_try_make_writable(); return skb->data + off;
> and final bpf_dynptr_write() is not needed.
> 
> But that doesn't work for xdp, since there is no pull.
> 
> It's not clear how to deal with BPF_F_RECOMPUTE_CSUM though.
> Expose __skb_postpull_rcsum/__skb_postpush_rcsum as kfuncs?
> But that defeats Andrii's goal to use dynptr as a generic wrapper.
> skb is quite special.
> 
> Maybe something like:
> void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len,
>                         void *buffer, u32 buffer__sz)
> {
>    if (skb_cloned()) {
>      skb_copy_bits(skb, offset, buffer, len);
>      return buffer;
>    }
>    return skb_header_pointer(...);
> }
> 
> When prog is just parsing the packet it doesn't need to finalize with bpf_dynptr_write.
> The prog can always write into the pointer followed by if (p == buf) bpf_dynptr_write.
> No need for rdonly flag, but extra copy is there in case of cloned which
> could have been avoided with extra rd_only flag.
> 
> In case of xdp it will be:
> void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, u32 offset, u32 len,
>                         void *buffer, u32 buffer__sz)
> {
>     ptr = bpf_xdp_pointer(xdp, offset, len);
>     if (ptr)
>        return ptr;
>     bpf_xdp_copy_buf(xdp, offset, buffer, len, false); /* copy into buf */
>     return buffer;
> }
> 
> bpf_dynptr_write will use bpf_xdp_copy_buf(,true); /* copy into xdp */

My preference would be making bpf_dynptr_slice() work similarly for skb and xdp, 
so above bpf_dynptr_slice() skb and xdp logic looks good.

Regarding, MEM_RDONLY, it probably is not relevant to xdp. For skb, not sure how 
often is the 'skb_cloned() && !skb_clone_writable()'. May be it can be left for 
later optimization?

Regarding BPF_F_RECOMPUTE_CSUM, I wonder if bpf_csum_diff() is enough to come up 
the csum. Then the missing kfunc is to update the skb->csum. Not sure how the 
csum logic will look like in xdp, probably getting csum from the xdp-hint, 
calculate csum_diff and then set it to the to-be-created skb. All this is likely 
a kfunc also, eg. a kfunc to directly allocate skb during the XDP_PASS case. The 
bpf prog will have to be written differently if it needs to deal with the csum 
but the header parsing part could at least be shared.