[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <98d4852e-c490-6062-4a85-45331aeb89c4@oracle.com>
Date: Fri, 31 Aug 2018 01:37:21 -0700
From: Tushar Dave <tushar.n.dave@...cle.com>
To: Daniel Borkmann <daniel@...earbox.net>, john.fastabend@...il.com,
ast@...nel.org, davem@...emloft.net, netdev@...r.kernel.org
Cc: sowmini.varadhan@...cle.com
Subject: Re: [PATCH net] ebpf: fix bpf_msg_pull_data
On 08/30/2018 12:20 AM, Daniel Borkmann wrote:
> On 08/30/2018 02:21 AM, Tushar Dave wrote:
>> On 08/29/2018 05:07 PM, Tushar Dave wrote:
>>> While doing some preliminary testing it is found that bpf helper
>>> bpf_msg_pull_data does not calculate the data and data_end offset
>>> correctly. Fix it!
>>>
>>> Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
>>> Signed-off-by: Tushar Dave <tushar.n.dave@...cle.com>
>>> Acked-by: Sowmini Varadhan <sowmini.varadhan@...cle.com>
>>> ---
>>> net/core/filter.c | 38 +++++++++++++++++++++++++-------------
>>> 1 file changed, 25 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index c25eb36..3eeb3d6 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -2285,7 +2285,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>> BPF_CALL_4(bpf_msg_pull_data,
>>> struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
>>> {
>>> - unsigned int len = 0, offset = 0, copy = 0;
>>> + unsigned int len = 0, offset = 0, copy = 0, off = 0;
>>> struct scatterlist *sg = msg->sg_data;
>>> int first_sg, last_sg, i, shift;
>>> unsigned char *p, *to, *from;
>>> @@ -2299,22 +2299,30 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>> i = msg->sg_start;
>>> do {
>>> len = sg[i].length;
>>> - offset += len;
>>> if (start < offset + len)
>>> break;
>>> + offset += len;
>>> i++;
>>> if (i == MAX_SKB_FRAGS)
>>> i = 0;
>>> - } while (i != msg->sg_end);
>>> + } while (i <= msg->sg_end);
>
> I don't think this condition is correct, msg operates as a scatterlist ring,
> so sg_end may very well be < current i when there's a wrap-around in the
> traversal ... more below.
I'm wondering then how this is suppose to work in case sg list is not
ring! For RDS, We have sg list that is not a ring. More below.
>
>>> + /* return error if start is out of range */
>>> if (unlikely(start >= offset + len))
>>> return -EINVAL;
>>> - if (!msg->sg_copy[i] && bytes <= len)
>>> - goto out;
>>> + /* return error if i is last entry in sglist and end is out of range */
>>> + if (msg->sg_copy[i] && end > offset + len)
>>> + return -EINVAL;
>>> first_sg = i;
>>> + /* if i is not last entry in sg list and end (i.e start + bytes) is
>>> + * within this sg[i] then goto out and calculate data and data_end
>>> + */
>>> + if (!msg->sg_copy[i] && end <= offset + len)
>>> + goto out;
>>> +
>>> /* At this point we need to linearize multiple scatterlist
>>> * elements or a single shared page. Either way we need to
>>> * copy into a linear buffer exclusively owned by BPF. Then
>>> @@ -2330,9 +2338,14 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>> i++;
>>> if (i == MAX_SKB_FRAGS)
>>> i = 0;
>>> - if (bytes < copy)
>>> + if (end < copy)
>>> break;
>>> - } while (i != msg->sg_end);
>>> + } while (i <= msg->sg_end);
>>> +
>>> + /* return error if i is last entry in sglist and end is out of range */
>>> + if (i > msg->sg_end && end > offset + copy)
>>> + return -EINVAL;
>>> +
>>> last_sg = i;
>>> if (unlikely(copy < end - start))
>>> @@ -2342,23 +2355,22 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>> if (unlikely(!page))
>>> return -ENOMEM;
>>> p = page_address(page);
>>> - offset = 0;
>>> i = first_sg;
>>> do {
>>> from = sg_virt(&sg[i]);
>>> len = sg[i].length;
>>> - to = p + offset;
>>> + to = p + off;
>>> memcpy(to, from, len);
>>> - offset += len;
>>> + off += len;
>>> sg[i].length = 0;
>>> put_page(sg_page(&sg[i]));
>>> i++;
>>> if (i == MAX_SKB_FRAGS)
>>> i = 0;
>>> - } while (i != last_sg);
>>> + } while (i < last_sg);
>>> sg[first_sg].length = copy;
>>> sg_set_page(&sg[first_sg], page, copy, 0);
>>> @@ -2380,7 +2392,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>> else
>>> move_from = i + shift;
>>> - if (move_from == msg->sg_end)
>>> + if (move_from > msg->sg_end)
>>> break;
>>> sg[i] = sg[move_from];
>>> @@ -2396,7 +2408,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>> if (msg->sg_end < 0)
>>> msg->sg_end += MAX_SKB_FRAGS;
>>> out:
>>> - msg->data = sg_virt(&sg[i]) + start - offset;
>>> + msg->data = sg_virt(&sg[first_sg]) + start - offset;
>>> msg->data_end = msg->data + bytes;
>>> return 0;
>>>
>>
>> Please discard this patch. I just noticed that Daniel Borkmann sent some similar fixes for bpf_msg_pull_data.
>
> Yeah I've been looking at these recently as well, please make sure you test
> with the below fixes included to see if there's anything left:
I tested the latest net tree which has all the fixes you mentioned and I
am still seeing issues.
As I already mentioned before on RFC v3 thread, we need to be careful
reusing 'offset' while linearizing multiple scatterlist
elements.
Variable 'offset' is used to calculate the 'msg->data'
i.e. msg->data = sg_virt(&sg[first_sg]) + start - offset"
We should use different variable name (e.g. off) while linearizing
multiple scatterlist elements or a single shared page so that we don't
overwrite 'offset' that has correct value already been calculated.
A code change for this would look like:
diff --git a/net/core/filter.c b/net/core/filter.c
index 60a29ca..076ca09 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2326,7 +2326,7 @@ struct sock *do_msg_redirect_map(struct
sk_msg_buff *msg)
BPF_CALL_4(bpf_msg_pull_data,
struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
{
- unsigned int len = 0, offset = 0, copy = 0;
+ unsigned int len = 0, offset = 0, copy = 0, off;
int bytes = end - start, bytes_sg_total;
struct scatterlist *sg = msg->sg_data;
int first_sg, last_sg, i, shift;
@@ -2354,6 +2354,8 @@ struct sock *do_msg_redirect_map(struct
sk_msg_buff *msg)
* account for the headroom.
*/
bytes_sg_total = start - offset + bytes;
if (!msg->sg_copy[i] && bytes_sg_total <= len)
goto out;
@@ -2382,18 +2384,18 @@ struct sock *do_msg_redirect_map(struct
sk_msg_buff *msg)
if (unlikely(!page))
return -ENOMEM;
p = page_address(page);
- offset = 0;
+ off = 0;
i = first_sg;
do {
from = sg_virt(&sg[i]);
len = sg[i].length;
- to = p + offset;
+ to = p + off;
memcpy(to, from, len);
- offset += len;
+ off += len;
sg[i].length = 0;
put_page(sg_page(&sg[i]));
sk_msg_iter_var(i);
} while (i != last_sg);
Apart from above bug fix, I see issues in below 2 cases:
case 1:
=======
For things like RDS, where sg list is not a ring how to know end of packet?
For example I have RDS packet of length 8192 Bytes which is in
scatterlist like
sge[0].lenghth = 1400
sge[1].length = 1448
sge[2].length = 1448
sge[3].length = 1448
sge[4].length = 1448
sge[5].length = 1000
Now when I execute "bpf_msg_pull_data(msg, 8190, 8196, 0)" on above RDS
packet, I expect bpf_msg_pull_data to return error because end is out of
bound e.g. 8196 > 8192. Instead current code wraps it to sge index 0!
That is if I dump first 6 bytes starting with 'start' I get byte value
at offset 8190, 8191 from sge[5] and value at offset 8192, 8193, 8194
8195 bytes from sge[0] from offset 0 ,1 ,2 ,3 respectively - which is
not expected!
case 2:
=======
Same example of RDS packet as above, when I execute
"bpf_msg_pull_data(msg, 7191, 8192, 0)"
I get -EINVAL which is *wrong*.
Debugging this, I found, right before we enter below do while loop the
values of variables are:
i = 4
first_sg = 4
last_sg= 5
start=7191
bytes_sg_total=2448
offset=5744
copy=0
len=1448
do {
copy += sg[i].length;
sk_msg_iter_var(i);
if (bytes_sg_total <= copy)
break;
} while (i != msg->sg_end);
last_sg = i;
However, above loop execute only one time because after one execution,
i=5 and msg->sg_end=5. That makes condition "while (i != msg->sg_end)"
false. So when loop exit, the values are:
i = 5
first_sg=4
last_sg= 5
start=7191
bytes_sg_total=2448 <<<<<<
offset=5744
copy=1448 <<<<<<
len=1448
And that makes below condition true
if (unlikely(bytes_sg_total > copy))
return -EINVAL;
There may be one or more similar corner scenarios like case 1 and 2
which can be fixed however I am not sure how can we fix so that we can
get it right for sg ring and sg list?
Thanks.
-Tushar
>
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/net/core/filter.c?id=5b24109b0563d45094c470684c1f8cea1af269f8
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/net/core/filter.c?id=0e06b227c5221dd51b5569de93f3b9f532be4a32
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/net/core/filter.c?id=2e43f95dd8ee62bc8bf57f2afac37fbd70c8d565
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/net/core/filter.c?id=a8cf76a9023bc6709b1361d06bb2fae5227b9d68
>
> Thanks,
> Daniel
>
Powered by blists - more mailing lists