netdev - Re: [RFC v2 PATCH 1/4] eBPF: Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <05b25de5-2d34-9e83-8297-703acee20aa7@oracle.com>
Date:   Fri, 29 Jun 2018 17:20:40 -0700
From:   Tushar Dave <tushar.n.dave@...cle.com>
To:     Daniel Borkmann <daniel@...earbox.net>,
        Daniel Borkmann <borkmann@...earbox.net>, ast@...nel.org,
        davem@...emloft.net, jakub.kicinski@...ronome.com,
        quentin.monnet@...ronome.com, jiong.wang@...ronome.com,
        guro@...com, sandipan@...ux.vnet.ibm.com, john.fastabend@...il.com,
        kafai@...com, rdna@...com, brakmo@...com, netdev@...r.kernel.org,
        acme@...hat.com, sowmini.varadhan@...cle.com
Subject: Re: [RFC v2 PATCH 1/4] eBPF: Add new eBPF prog type
 BPF_PROG_TYPE_SOCKET_SG_FILTER



On 06/29/2018 01:48 AM, Daniel Borkmann wrote:
> On 06/29/2018 09:25 AM, Daniel Borkmann wrote:
>> On 06/19/2018 08:00 PM, Tushar Dave wrote:
>>> Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER which uses the
>>> existing socket filter infrastructure for bpf program attach and load.
>>> SOCKET_SG_FILTER eBPF program receives struct scatterlist as bpf context
>>> contrast to SOCKET_FILTER which deals with struct skb. This is useful
>>> for kernel entities that don't have skb to represent packet data but
>>> want to run eBPF socket filter on packet data that is in form of struct
>>> scatterlist e.g. IB/RDMA
>>>
>>> Signed-off-by: Tushar Dave <tushar.n.dave@...cle.com>
>>> Acked-by: Sowmini Varadhan <sowmini.varadhan@...cle.com>
> [...]
>>>   static void __bpf_prog_release(struct bpf_prog *prog)
>>>   {
>>> -	if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER) {
>>> +	if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER ||
>>> +	    prog->type == BPF_PROG_TYPE_SOCKET_SG_FILTER) {
>>>   		bpf_prog_put(prog);
>>>   	} else {
>>>   		bpf_release_orig_filter(prog);
>>> @@ -1551,10 +1552,16 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk)
>>>   
>>>   static struct bpf_prog *__get_bpf(u32 ufd, struct sock *sk)
>>>   {
>>> +	struct bpf_prog *prog;
>>> +
>>>   	if (sock_flag(sk, SOCK_FILTER_LOCKED))
>>>   		return ERR_PTR(-EPERM);
>>>   
>>> -	return bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_FILTER);
>>> +	prog = bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_FILTER);
>>> +	if (IS_ERR(prog))
>>> +		prog = bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_SG_FILTER);
>>> +
>>> +	return prog;
>>>   }
>>
>> Hmm, I don't think this works: this now means as unpriviledged I can attach a new
>> BPF_PROG_TYPE_SOCKET_SG_FILTER to a non-rds socket e.g. normal tcp/udp through the
>> SO_ATTACH_BPF sockopt, where input context is skb instead of sg list and thus crash
>> my box?

hmm.. I see the problem.

> 
> ... probably best to just make a setsockopt specific to rds here, so the two are fully
> separated.

Yes, it makes sense to make setsockopt specific to RDS that gives us
complete separation (and not to worry about above problem).
btw, to to make setsockopt specific to RDS though we have to export
sk_attach_bpf().

> 
> Also worth exploring whether you can reuse as much as possible from the struct sk_msg_buff
> context and in general the BPF_PROG_TYPE_SK_MSG type that is using this which we already
> have in sockmap today. At least feels like some of the concepts are a bit similar. For
> pulling in more payload you have bpf_msg_pull_data() there which I think might be more
> user-friendly at least in that you have the full payload from start to the 'current' end
> available and don't need to navigate through individual sg entries back/forth which could
> perhaps end up being bit painful for users, though I can see that it's a middle ground
> between some skb_load_bytes()-alike helper that would copy the pieces out of the sg entries
> vs needing to linearize. What are the requirements here, would it make sense to offer both
> as an option or is this impractical based on what you've measured?

Yes, sockmap also deal with struct scatterlist so from that prospective
I certainly try to reuse code wherever I can e.g. most likely get rid of
struct bpf_scatterlist and use struct sk_msg_buff.

Form use-case prospective, we want to look at complete payload (that
includes RDS header as well) and based on that take actions like pass
,drop or forward. So, I agree that it makes sense to not iterate over sg
element back/forth [1]. I guess bpf_msg_pull_data() would do the work.
If there is need we may add sg_load_bytes()-like helper.

Thanks for taking time reviewing. I will work on the suggested changes.

-Tushar

[1]
btw, bpf_sg_next() was added to just showcase an example and if there is
no real need of it I will get rid of it.