[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eddca040-c4b7-79c4-d303-08e0b3ae3242@blackwall.org>
Date: Sat, 23 Apr 2022 18:10:06 +0300
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc: kuba@...nel.org, davem@...emloft.net, stable@...r.kernel.org,
Jason Wang <jasowang@...hat.com>,
Daniel Borkmann <daniel@...earbox.net>,
"Michael S. Tsirkin" <mst@...hat.com>,
virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org
Subject: Re: [PATCH net] virtio_net: fix wrong buf address calculation when
using xdp
On 23/04/2022 17:46, Nikolay Aleksandrov wrote:
> On 23/04/2022 17:30, Nikolay Aleksandrov wrote:
>> On 23/04/2022 17:16, Nikolay Aleksandrov wrote:
>>> On 23/04/2022 16:31, Xuan Zhuo wrote:
>>>> On Sat, 23 Apr 2022 14:26:12 +0300, Nikolay Aleksandrov <razor@...ckwall.org> wrote:
>>>>> We received a report[1] of kernel crashes when Cilium is used in XDP
>>>>> mode with virtio_net after updating to newer kernels. After
>>>>> investigating the reason it turned out that when using mergeable bufs
>>>>> with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
>>>>> calculates the build_skb address wrong because the offset can become less
>>>>> than the headroom so it gets the address of the previous page (-X bytes
>>>>> depending on how lower offset is):
>>>>> page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256
>>>>>
>>>>> This is a pr_err() I added in the beginning of page_to_skb which clearly
>>>>> shows offset that is less than headroom by adding 4 bytes of metadata
>>>>> via an xdp prog. The calculations done are:
>>>>> receive_mergeable():
>>>>> headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes
>>>>> offset = xdp.data - page_address(xdp_page) -
>>>>> vi->hdr_len - metasize;
>>>>>
>>>>> page_to_skb():
>>>>> p = page_address(page) + offset;
>>>>> ...
>>>>> buf = p - headroom;
>>>>>
>>>>> Now buf goes -4 bytes from the page's starting address as can be seen
>>>>> above which is set as skb->head and skb->data by build_skb later. Depending
>>>>> on what's done with the skb (when it's freed most often) we get all kinds
>>>>> of corruptions and BUG_ON() triggers in mm[2]. The story of the faulty
>>>>> commit is interesting because the patch was sent and applied twice (it
>>>>> seems the first one got lost during merge back in 5.13 window). The
>>>>> first version of the patch that was applied as:
>>>>> commit 7bf64460e3b2 ("virtio-net: get build_skb() buf by data ptr")
>>>>> was actually correct because it calculated the page starting address
>>>>> without relying on offset or headroom, but then the second version that
>>>>> was applied as:
>>>>> commit 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
>>>>> was wrong and added the above calculation.
>>>>> An example xdp prog[3] is below.
>>>>>
>>>>> [1] https://github.com/cilium/cilium/issues/19453
>>>>>
>>>>> [2] Two of the many traces:
> [snip]
>>>>> drivers/net/virtio_net.c | 8 ++++++--
>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index 87838cbe38cf..0687dd88e97f 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -434,9 +434,13 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>>> * Buffers with headroom use PAGE_SIZE as alloc size, see
>>>>> * add_recvbuf_mergeable() + get_mergeable_buf_len()
>>>>> */
>>>>> - truesize = headroom ? PAGE_SIZE : truesize;
>>>>> + if (headroom) {
>>>>> + truesize = PAGE_SIZE;
>>>>> + buf = (char *)((unsigned long)p & PAGE_MASK);
>>>>
>>>> The reason for not doing this is that buf and p may not be on the same page, and
>>>> buf is probably not page-aligned.
>>>>
>>>> The implementation of virtio-net merge is add_recvbuf_mergeable(), which
>>>> allocates a large block of memory at one time, and allocates from it each time.
>>>> Although in xdp mode, each allocation is page_size, it does not guarantee that
>>>> each allocation is page-aligned .
>>>>
>>>> The problem here is that the value of headroom is wrong, the package is
>>>> structured like this:
>>>>
>>>> from device | headroom | virtio-net hdr | data |
>>>> after xdp | headroom | virtio-net hdr | meta | data |
>>>
>>> You're free to push data back (not necessarily through meta).
>>> You don't have virtio-net hdr for the xdp case (hdr_valid is false there).
>>>
>>>>
>>>> The page_address(page) + offset we pass to page_to_skb() points to the
>>>> virtio-net hdr.
>>>>
>>>> So I think it might be better to change it this way.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 87838cbe38cf..086ae835ec86 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -1012,7 +1012,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>> head_skb = page_to_skb(vi, rq, xdp_page, offset,
>>>> len, PAGE_SIZE, false,
>>>> metasize,
>>>> - VIRTIO_XDP_HEADROOM);
>>>> + VIRTIO_XDP_HEADROOM - metazie);
>>>> return head_skb;
>>>> }
>>>> break;
>>>
>>> That patch doesn't fix it, as I said with xdp you can move both data and data_meta.
>>> So just doing that would take care of the meta, but won't take care of moving data.
>>>
>>
>> Also it doesn't take care of the case where page_to_skb() is called with the original page
>> i.e. when we already have headroom, so we hit the next/standard page_to_skb() call (xdp_page == page).
>>
>> The above change guarantees that buf and p will be in the same page and the skb_reserve() call will
>> make skb->data point to p - buf, i.e. to the beginning of the valid data in that page.
>> Unfortunately the new headroom will not be correct if it is a frag, it will be longer.
>>
>>
>
> Completely untested alternative could be based on the offset size, that is if it has
> eaten into the headroom and is smaller then we swap them (that means we start at page
> boundary since we have headroom guaranteed space):
> buf = page_address(page) + (offset > headroom ? offset - headroom : 0);
>
> or perhaps in current code terms:
> buf = p - (offset > headroom ? headroom : offset);
>
Actually looking at add_recvbuf_mergeable() I take that back. We should look into a
different solution. That seems wrong as well.
If headroom can reside in 2 pages it is more difficult to get the correct address.
> That means offset is somewhere inside the headroom of the buf and, the buf itself
> starts at page boundary (when offset < headroom). I think this preserves the correct
> headroom for the new skb. WDYT?
>
> Cheers,
> Nik
>
>
>
Powered by blists - more mailing lists