[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180117062013.upgi7q5qia6tuawc@ast-mbp>
Date: Tue, 16 Jan 2018 22:20:15 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: John Fastabend <john.fastabend@...il.com>
Cc: borkmann@...earbox.net, ast@...nel.org, netdev@...r.kernel.org,
kafai@...com
Subject: Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to
monitor socket TX/RX data
On Tue, Jan 16, 2018 at 09:49:16PM -0800, John Fastabend wrote:
>
> > but this program will see only first SG ?
>
> Correct, to read further into the msg we would need to have a helper
> or some other way to catch reads/writes past the first 4k and read
> the next sg. We have the same limitation in cls_bpf.
>
> I have started a patch on top of this series with the current working
> title msg_apply_bytes(int bytes). This would let us apply a verdict to
> a set number of bytes instead of the entire msg. By calling
> msg_apply_bytes(data_end - data) a user who needs to read an entire msg
> could do this in 4k chunks until the entire msg is passed through the
> bpf prog.
good idea.
I think would be good to add this helper as part of this patch set
to make sure there is a way for user to look through the whole
tcp stream if the program really wants to.
I understand that program cannot examine every byte anyway
due to lack of loops and helpers, but this part of sockmap api
should still provide an interface from day one.
One example would be the program parsing http2 or similar
where in the header it sees length. Then it can do
msg_apply_bytes(length)
to skip the bytes it processed, but still continue within
the same 64Kbyte chunk when 0 < length < 64k
> > and it's typically going to be one page only ?
>
> yep
>
> > then what's the value of waiting for MAX_SKB_FRAGS ?
> >
> Its not waiting for MAX_SKB_FRAGS its simple copying up to MAX_SKB_FRAGS
> pages in one call if possible. It seems better to me to run this loop
> over as much data as we can.
agree on trying to do MAX_SKB_FRAGS as a 'processing unit',
but program should still be able to skip or redirect
parts of the bytes and not the whole 64k chunk.
>From program point of view it should never see or worry about
SG list boundaries whereas right now it seems that below code
is dealing with them (though program doesn't know where sg ends):
> +
> + switch (eval) {
> + case __SK_PASS:
> + sg_mark_end(sg + sg_num - 1);
> + err = bpf_tcp_push(sk, sg, &sg_curr, flags, true);
> + if (unlikely(err)) {
> + copied -= free_sg(sk, sg, sg_curr, sg_num);
> + goto out_err;
> + }
> + break;
> + case __SK_REDIRECT:
> + sg_mark_end(sg + sg_num - 1);
> + goto do_redir;
...
> >> +static int bpf_tcp_ulp_register(void)
> >> +{
> >> + tcp_bpf_proto = tcp_prot;
> >> + tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg;
> >> + tcp_bpf_proto.sendpage = bpf_tcp_sendpage;
> >> + return tcp_register_ulp(&bpf_tcp_ulp_ops);
> >
> > I don't see corresponding tcp_unregister_ulp().
> >
>
> There is none. tcp_register_ulp() adds the bpf_tcp_ulp to the list of
> available ULPs and never removes it. To remove it we would have to
> keep a ref count on the reg/unreg calls. This would require a couple
> more patches to the ULP infra and not sure it hurts to leave the ULP
> reference around...
>
> Maybe a follow on patch? Or else it could be a patch in front of this
> patch.
I see. I'm ok with leaving that for latter.
It doesn't hurt to keep it registered. Please add a comment though.
Powered by blists - more mailing lists