lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Tue, 1 Jun 2021 14:17:27 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        virtualization@...ts.linux-foundation.org, bpf@...r.kernel.org,
        netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2] virtio-net: get build_skb() buf by data ptr


在 2021/6/1 下午1:59, Xuan Zhuo 写道:
> On Tue, 1 Jun 2021 11:27:12 +0800, Jason Wang <jasowang@...hat.com> wrote:
>> 在 2021/6/1 上午11:08, Xuan Zhuo 写道:
>>> On Tue, 1 Jun 2021 11:03:37 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>> 在 2021/5/31 下午6:58, Xuan Zhuo 写道:
>>>>> On Mon, 31 May 2021 14:10:55 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>>>> 在 2021/5/14 下午11:16, Xuan Zhuo 写道:
>>>>>>> In the case of merge, the page passed into page_to_skb() may be a head
>>>>>>> page, not the page where the current data is located.
>>>>>> I don't get how this can happen?
>>>>>>
>>>>>> Maybe you can explain a little bit more?
>>>>>>
>>>>>> receive_mergeable() call page_to_skb() in two places:
>>>>>>
>>>>>> 1) XDP_PASS for linearized page , in this case we use xdp_page
>>>>>> 2) page_to_skb() for "normal" page, in this case the page contains the data
>>>>> The offset may be greater than PAGE_SIZE, because page is obtained by
>>>>> virt_to_head_page(), not the page where buf is located. And "offset" is the offset
>>>>> of buf relative to page.
>>>>>
>>>>> 	tailroom = truesize - len - offset;
>>>>>
>>>>> In this case, the tailroom must be less than 0. Although there may be enough
>>>>> content on this page to save skb_shared_info.
>>>> Interesting, I think we don't use compound pages for virtio-net. (We
>>>> don't define SKB_FRAG_PAGE_ORDER).
>>>>
>>>> Am I wrong?
>>> It seems to me that it seems to be a fixed setting, not for us to configure
>>> independently
>>
>> Looks like you are right.
>>
>> See comments below.
>>
>>
>>> Thanks.
>>>
>>> ==========================================
>>>
>>> net/sock.c
>>>
>>> #define SKB_FRAG_PAGE_ORDER	get_order(32768)
>>> DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);
>>>
>>> /**
>>>    * skb_page_frag_refill - check that a page_frag contains enough room
>>>    * @sz: minimum size of the fragment we want to get
>>>    * @pfrag: pointer to page_frag
>>>    * @gfp: priority for memory allocation
>>>    *
>>>    * Note: While this allocator tries to use high order pages, there is
>>>    * no guarantee that allocations succeed. Therefore, @sz MUST be
>>>    * less or equal than PAGE_SIZE.
>>>    */
>>> bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
>>> {
>>> 	if (pfrag->page) {
>>> 		if (page_ref_count(pfrag->page) == 1) {
>>> 			pfrag->offset = 0;
>>> 			return true;
>>> 		}
>>> 		if (pfrag->offset + sz <= pfrag->size)
>>> 			return true;
>>> 		put_page(pfrag->page);
>>> 	}
>>>
>>> 	pfrag->offset = 0;
>>> 	if (SKB_FRAG_PAGE_ORDER &&
>>> 	    !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
>>> 		/* Avoid direct reclaim but allow kswapd to wake */
>>> 		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>>> 					  __GFP_COMP | __GFP_NOWARN |
>>> 					  __GFP_NORETRY,
>>> 					  SKB_FRAG_PAGE_ORDER);
>>> 		if (likely(pfrag->page)) {
>>> 			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
>>> 			return true;
>>> 		}
>>> 	}
>>> 	pfrag->page = alloc_page(gfp);
>>> 	if (likely(pfrag->page)) {
>>> 		pfrag->size = PAGE_SIZE;
>>> 		return true;
>>> 	}
>>> 	return false;
>>> }
>>> EXPORT_SYMBOL(skb_page_frag_refill);
>>>
>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks.
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> So when trying to
>>>>>>> get the buf where the data is located, you should directly use the
>>>>>>> pointer(p) to get the address corresponding to the page.
>>>>>>>
>>>>>>> At the same time, the offset of the data in the page should also be
>>>>>>> obtained using offset_in_page().
>>>>>>>
>>>>>>> This patch solves this problem. But if you don’t use this patch, the
>>>>>>> original code can also run, because if the page is not the page of the
>>>>>>> current data, the calculated tailroom will be less than 0, and will not
>>>>>>> enter the logic of build_skb() . The significance of this patch is to
>>>>>>> modify this logical problem, allowing more situations to use
>>>>>>> build_skb().
>>>>>>>
>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
>>>>>>> Acked-by: Michael S. Tsirkin <mst@...hat.com>
>>>>>>> ---
>>>>>>>      drivers/net/virtio_net.c | 8 ++++++--
>>>>>>>      1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index 3e46c12dde08..073fec4c0df1 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -407,8 +407,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>>>>>      		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>>>>>>>      		 */
>>>>>>>      		truesize = PAGE_SIZE;
>>>>>>> -		tailroom = truesize - len - offset;
>>>>>>> -		buf = page_address(page);
>>>>>>> +
>>>>>>> +		/* page maybe head page, so we should get the buf by p, not the
>>>>>>> +		 * page
>>>>>>> +		 */
>>>>>>> +		tailroom = truesize - len - offset_in_page(p);
>>
>> I wonder why offset_in_page(p) is correct? I guess it should be:
>>
>> tailroom = truesize - len - headroom;
>>
>> The reason is that the buffer is not necessarily allocated at the page
>> boundary.
> In my understanding, offset_in_page(p) is the offset of p in the page where it
> is located. In this case, the two should be equal.


I think not, if the frag is not page aligned. offset_in_page(p) doesn't 
equal to headroom.

Consider the case that the frag start from page offset 1500.


>   This has nothing to do with
> which page is allocated.
>
> Of course I think using headroom is a good idea, and it is semantically better.
>
> Thanks.


Thanks


>
>> Thanks
>>
>>
>>>>>>> +		buf = (char *)((unsigned long)p & PAGE_MASK);
>>>>>>>      	} else {
>>>>>>>      		tailroom = truesize - len;
>>>>>>>      		buf = p;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ