netdev - Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <80c4b9fb-7938-782f-6b5f-012db04c41e6@gmail.com>
Date:   Wed, 17 Jan 2018 12:32:03 -0800
From:   John Fastabend <john.fastabend@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     borkmann@...earbox.net, ast@...nel.org, netdev@...r.kernel.org,
        kafai@...com
Subject: Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to
 monitor socket TX/RX data

On 01/16/2018 10:20 PM, Alexei Starovoitov wrote:
> On Tue, Jan 16, 2018 at 09:49:16PM -0800, John Fastabend wrote:
>>
>>> but this program will see only first SG ?
>>
>> Correct, to read further into the msg we would need to have a helper
>> or some other way to catch reads/writes past the first 4k and read
>> the next sg. We have the same limitation in cls_bpf.
>>
>> I have started a patch on top of this series with the current working
>> title msg_apply_bytes(int bytes). This would let us apply a verdict to
>> a set number of bytes instead of the entire msg. By calling
>> msg_apply_bytes(data_end - data) a user who needs to read an entire msg
>> could do this in 4k chunks until the entire msg is passed through the
>> bpf prog.
> 
> good idea.
> I think would be good to add this helper as part of this patch set
> to make sure there is a way for user to look through the whole
> tcp stream if the program really wants to.

Sure I'll add it to this series.

> I understand that program cannot examine every byte anyway
> due to lack of loops and helpers, but this part of sockmap api
> should still provide an interface from day one.
> One example would be the program parsing http2 or similar
> where in the header it sees length. Then it can do
> msg_apply_bytes(length)
> to skip the bytes it processed, but still continue within
> the same 64Kbyte chunk when 0 < length < 64k
> 

Yep, this is my use case.

>>> and it's typically going to be one page only ?
>>
>> yep
>>
>>> then what's the value of waiting for MAX_SKB_FRAGS ?
>>>
>> Its not waiting for MAX_SKB_FRAGS its simple copying up to MAX_SKB_FRAGS
>> pages in one call if possible. It seems better to me to run this loop
>> over as much data as we can.
> 
> agree on trying to do MAX_SKB_FRAGS as a 'processing unit',
> but program should still be able to skip or redirect 
> parts of the bytes and not the whole 64k chunk.
> From program point of view it should never see or worry about
> SG list boundaries whereas right now it seems that below code
> is dealing with them (though program doesn't know where sg ends):
> 

The apply_bytes() helper allows for skip and redirect of
parts of the bytes and not the whole 64k chunk.

>> +
>> +		switch (eval) {
>> +		case __SK_PASS:
>> +			sg_mark_end(sg + sg_num - 1);

All this does is tell the TCP sendpage call that after this sg entry
we don't have anymore data so go ahead and flush it.

>> +			err = bpf_tcp_push(sk, sg, &sg_curr, flags, true);
>> +			if (unlikely(err)) {
>> +				copied -= free_sg(sk, sg, sg_curr, sg_num);
>> +				goto out_err;
>> +			}
>> +			break;
>> +		case __SK_REDIRECT:
>> +			sg_mark_end(sg + sg_num - 1);
>> +			goto do_redir;
> ...
>>>> +static int bpf_tcp_ulp_register(void)
>>>> +{
>>>> +	tcp_bpf_proto = tcp_prot;
>>>> +	tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg;
>>>> +	tcp_bpf_proto.sendpage = bpf_tcp_sendpage;
>>>> +	return tcp_register_ulp(&bpf_tcp_ulp_ops);
>>>
>>> I don't see corresponding tcp_unregister_ulp().
>>>
>>
>> There is none. tcp_register_ulp() adds the bpf_tcp_ulp to the list of
>> available ULPs and never removes it. To remove it we would have to
>> keep a ref count on the reg/unreg calls. This would require a couple
>> more patches to the ULP infra and not sure it hurts to leave the ULP
>> reference around...
>>
>> Maybe a follow on patch? Or else it could be a patch in front of this
>> patch.
> 
> I see. I'm ok with leaving that for latter.
> It doesn't hurt to keep it registered. Please add a comment though.
> 

OK will add comment.