lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 31 Aug 2018 01:37:21 -0700
From:   Tushar Dave <tushar.n.dave@...cle.com>
To:     Daniel Borkmann <daniel@...earbox.net>, john.fastabend@...il.com,
        ast@...nel.org, davem@...emloft.net, netdev@...r.kernel.org
Cc:     sowmini.varadhan@...cle.com
Subject: Re: [PATCH net] ebpf: fix bpf_msg_pull_data



On 08/30/2018 12:20 AM, Daniel Borkmann wrote:
> On 08/30/2018 02:21 AM, Tushar Dave wrote:
>> On 08/29/2018 05:07 PM, Tushar Dave wrote:
>>> While doing some preliminary testing it is found that bpf helper
>>> bpf_msg_pull_data does not calculate the data and data_end offset
>>> correctly. Fix it!
>>>
>>> Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
>>> Signed-off-by: Tushar Dave <tushar.n.dave@...cle.com>
>>> Acked-by: Sowmini Varadhan <sowmini.varadhan@...cle.com>
>>> ---
>>>    net/core/filter.c | 38 +++++++++++++++++++++++++-------------
>>>    1 file changed, 25 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index c25eb36..3eeb3d6 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -2285,7 +2285,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>>    BPF_CALL_4(bpf_msg_pull_data,
>>>           struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
>>>    {
>>> -    unsigned int len = 0, offset = 0, copy = 0;
>>> +    unsigned int len = 0, offset = 0, copy = 0, off = 0;
>>>        struct scatterlist *sg = msg->sg_data;
>>>        int first_sg, last_sg, i, shift;
>>>        unsigned char *p, *to, *from;
>>> @@ -2299,22 +2299,30 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>>        i = msg->sg_start;
>>>        do {
>>>            len = sg[i].length;
>>> -        offset += len;
>>>            if (start < offset + len)
>>>                break;
>>> +        offset += len;
>>>            i++;
>>>            if (i == MAX_SKB_FRAGS)
>>>                i = 0;
>>> -    } while (i != msg->sg_end);
>>> +    } while (i <= msg->sg_end);
> 
> I don't think this condition is correct, msg operates as a scatterlist ring,
> so sg_end may very well be < current i when there's a wrap-around in the
> traversal ... more below.

I'm wondering then how this is suppose to work in case sg list is not
ring! For RDS, We have sg list that is not a ring. More below.

> 
>>>    +    /* return error if start is out of range */
>>>        if (unlikely(start >= offset + len))
>>>            return -EINVAL;
>>>    -    if (!msg->sg_copy[i] && bytes <= len)
>>> -        goto out;
>>> +    /* return error if i is last entry in sglist and end is out of range */
>>> +    if (msg->sg_copy[i] && end > offset + len)
>>> +        return -EINVAL;
>>>          first_sg = i;
>>>    +    /* if i is not last entry in sg list and end (i.e start + bytes) is
>>> +     * within this sg[i] then goto out and calculate data and data_end
>>> +     */
>>> +    if (!msg->sg_copy[i] && end <= offset + len)
>>> +        goto out;
>>> +
>>>        /* At this point we need to linearize multiple scatterlist
>>>         * elements or a single shared page. Either way we need to
>>>         * copy into a linear buffer exclusively owned by BPF. Then
>>> @@ -2330,9 +2338,14 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>>            i++;
>>>            if (i == MAX_SKB_FRAGS)
>>>                i = 0;
>>> -        if (bytes < copy)
>>> +        if (end < copy)
>>>                break;
>>> -    } while (i != msg->sg_end);
>>> +    } while (i <= msg->sg_end);
>>> +
>>> +    /* return error if i is last entry in sglist and end is out of range */
>>> +    if (i > msg->sg_end && end > offset + copy)
>>> +        return -EINVAL;
>>> +
>>>        last_sg = i;
>>>          if (unlikely(copy < end - start))
>>> @@ -2342,23 +2355,22 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>>        if (unlikely(!page))
>>>            return -ENOMEM;
>>>        p = page_address(page);
>>> -    offset = 0;
>>>          i = first_sg;
>>>        do {
>>>            from = sg_virt(&sg[i]);
>>>            len = sg[i].length;
>>> -        to = p + offset;
>>> +        to = p + off;
>>>              memcpy(to, from, len);
>>> -        offset += len;
>>> +        off += len;
>>>            sg[i].length = 0;
>>>            put_page(sg_page(&sg[i]));
>>>              i++;
>>>            if (i == MAX_SKB_FRAGS)
>>>                i = 0;
>>> -    } while (i != last_sg);
>>> +    } while (i < last_sg);
>>>          sg[first_sg].length = copy;
>>>        sg_set_page(&sg[first_sg], page, copy, 0);
>>> @@ -2380,7 +2392,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>>            else
>>>                move_from = i + shift;
>>>    -        if (move_from == msg->sg_end)
>>> +        if (move_from > msg->sg_end)
>>>                break;
>>>              sg[i] = sg[move_from];
>>> @@ -2396,7 +2408,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>>        if (msg->sg_end < 0)
>>>            msg->sg_end += MAX_SKB_FRAGS;
>>>    out:
>>> -    msg->data = sg_virt(&sg[i]) + start - offset;
>>> +    msg->data = sg_virt(&sg[first_sg]) + start - offset;
>>>        msg->data_end = msg->data + bytes;
>>>          return 0;
>>>
>>
>> Please discard this patch. I just noticed that Daniel Borkmann sent some similar fixes for bpf_msg_pull_data.
> 
> Yeah I've been looking at these recently as well, please make sure you test
> with the below fixes included to see if there's anything left:

I tested the latest net tree which has all the fixes you mentioned and I
am still seeing issues.

As I already mentioned before on RFC v3 thread, we need to be careful
reusing 'offset' while linearizing multiple scatterlist
elements.
Variable 'offset' is used to calculate the 'msg->data'
i.e. msg->data = sg_virt(&sg[first_sg]) + start - offset"

We should use different variable name (e.g. off) while linearizing
multiple scatterlist elements or a single shared page so that we don't
overwrite 'offset' that has correct value already been calculated.

A code change for this would look like:

diff --git a/net/core/filter.c b/net/core/filter.c
index 60a29ca..076ca09 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2326,7 +2326,7 @@ struct sock *do_msg_redirect_map(struct 
sk_msg_buff *msg)
  BPF_CALL_4(bpf_msg_pull_data,
            struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
  {
-       unsigned int len = 0, offset = 0, copy = 0;
+       unsigned int len = 0, offset = 0, copy = 0, off;
         int bytes = end - start, bytes_sg_total;
         struct scatterlist *sg = msg->sg_data;
         int first_sg, last_sg, i, shift;
@@ -2354,6 +2354,8 @@ struct sock *do_msg_redirect_map(struct 
sk_msg_buff *msg)
          * account for the headroom.
          */
         bytes_sg_total = start - offset + bytes;

         if (!msg->sg_copy[i] && bytes_sg_total <= len)
                 goto out;

@@ -2382,18 +2384,18 @@ struct sock *do_msg_redirect_map(struct 
sk_msg_buff *msg)
         if (unlikely(!page))
                 return -ENOMEM;
         p = page_address(page);
-       offset = 0;
+       off = 0;

         i = first_sg;
         do {
                 from = sg_virt(&sg[i]);
                 len = sg[i].length;
-               to = p + offset;
+               to = p + off;

                 memcpy(to, from, len);
-               offset += len;
+               off += len;
                 sg[i].length = 0;
                 put_page(sg_page(&sg[i]));

                 sk_msg_iter_var(i);
         } while (i != last_sg);


Apart from above bug fix, I see issues in below 2 cases:

case 1:
=======
For things like RDS, where sg list is not a ring how to know end of packet?

For example I have RDS packet of length 8192 Bytes which is in
scatterlist like
sge[0].lenghth = 1400
sge[1].length = 1448
sge[2].length = 1448
sge[3].length = 1448
sge[4].length = 1448
sge[5].length = 1000

Now when I execute "bpf_msg_pull_data(msg, 8190, 8196, 0)" on above RDS
packet, I expect bpf_msg_pull_data to return error because end is out of
bound e.g. 8196 > 8192. Instead current code wraps it to sge index 0!
That is if I dump first 6 bytes starting with 'start' I get byte value
at offset 8190, 8191 from sge[5] and value at offset 8192, 8193, 8194
8195 bytes from sge[0] from offset 0 ,1 ,2 ,3 respectively - which is
not expected!


case 2:
=======
Same example of RDS packet as above, when I execute
"bpf_msg_pull_data(msg, 7191, 8192, 0)"
I get -EINVAL which is *wrong*.

Debugging this, I found, right before we enter below do while loop the 
values of variables are:
i = 4
first_sg = 4
last_sg= 5
start=7191
bytes_sg_total=2448
offset=5744
copy=0
len=1448

      do {
                 copy += sg[i].length;
                 sk_msg_iter_var(i);
                 if (bytes_sg_total <= copy)
                         break;
         } while (i != msg->sg_end);
         last_sg = i;

However, above loop execute only one time because after one execution,
i=5 and msg->sg_end=5. That makes condition "while (i != msg->sg_end)"
false. So when loop exit, the values are:
i = 5
first_sg=4
last_sg= 5
start=7191
bytes_sg_total=2448 <<<<<<
offset=5744
copy=1448 <<<<<<
len=1448

And that makes below condition true
         if (unlikely(bytes_sg_total > copy))
                 return -EINVAL;


There may be one or more similar corner scenarios like case 1 and 2
which can be fixed however I am not sure how can we fix so that we can
get it right for sg ring and sg list?

Thanks.
-Tushar
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/net/core/filter.c?id=5b24109b0563d45094c470684c1f8cea1af269f8
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/net/core/filter.c?id=0e06b227c5221dd51b5569de93f3b9f532be4a32
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/net/core/filter.c?id=2e43f95dd8ee62bc8bf57f2afac37fbd70c8d565
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/net/core/filter.c?id=a8cf76a9023bc6709b1361d06bb2fae5227b9d68
> 
> Thanks,
> Daniel
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ