[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b3d504e8-2fc2-e520-f6ce-bbaa72c35037@gmail.com>
Date: Mon, 19 Mar 2018 22:54:28 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: davejwatson@...com, davem@...emloft.net, daniel@...earbox.net,
ast@...nel.org, netdev@...r.kernel.org
Subject: Re: [bpf-next PATCH v3 08/18] bpf: sk_msg program helper
bpf_sk_msg_pull_data
On 03/19/2018 01:24 PM, Alexei Starovoitov wrote:
> On Sun, Mar 18, 2018 at 12:57:25PM -0700, John Fastabend wrote:
>> Currently, if a bpf sk msg program is run the program
>> can only parse data that the (start,end) pointers already
>> consumed. For sendmsg hooks this is likely the first
>> scatterlist element. For sendpage this will be the range
>> (0,0) because the data is shared with userspace and by
>> default we want to avoid allowing userspace to modify
>> data while (or after) BPF verdict is being decided.
>>
>> To support pulling in additional bytes for parsing use
>> a new helper bpf_sk_msg_pull(start, end, flags) which
>> works similar to cls tc logic. This helper will attempt
>> to point the data start pointer at 'start' bytes offest
>> into msg and data end pointer at 'end' bytes offset into
>> message.
>>
>> After basic sanity checks to ensure 'start' <= 'end' and
>> 'end' <= msg_length there are a few cases we need to
>> handle.
>>
>> First the sendmsg hook has already copied the data from
>> userspace and has exclusive access to it. Therefor, it
>> is not necessesary to copy the data. However, it may
>> be required. After finding the scatterlist element with
>> 'start' offset byte in it there are two cases. One the
>> range (start,end) is entirely contained in the sg element
>> and is already linear. All that is needed is to update the
>> data pointers, no allocate/copy is needed. The other case
>> is (start, end) crosses sg element boundaries. In this
>> case we allocate a block of size 'end - start' and copy
>> the data to linearize it.
>>
>> Next sendpage hook has not copied any data in initial
>> state so that data pointers are (0,0). In this case we
>> handle it similar to the above sendmsg case except the
>> allocation/copy must always happen. Then when sending
>> the data we have possibly three memory regions that
>> need to be sent, (0, start - 1), (start, end), and
>> (end + 1, msg_length). This is required to ensure any
>> writes by the BPF program are correctly transmitted.
>>
>> Lastly this operation will invalidate any previous
>> data checks so BPF programs will have to revalidate
>> pointers after making this BPF call.
>>
>> Signed-off-by: John Fastabend <john.fastabend@...il.com>
> ..
>> +
>> + page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy));
>> + if (unlikely(!page))
>> + return -ENOMEM;
>
> I think that's fine. Just curious what order do you see in practice?
At the moment I'm mostly reading headers so this only
happens when a header is split across multiple scatterlist
elements. In these cases a copy size of less than 4k is good
enough.
Some of the nginx configurations I have use a max sendfile
size of 128kb. So these are larger, but unless we look
at the payload we can avoid reading/writing this. If
it becomes commonplace we could look at optimizing it.
Should be doable without changing the user facing API.
>
> Acked-by: Alexei Starovoitov <ast@...nel.org>
>
Powered by blists - more mailing lists