lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <93f79b8b-62c3-60f0-b401-0908e3e6325f@redhat.com>
Date:   Tue, 1 Jun 2021 11:27:12 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        virtualization@...ts.linux-foundation.org, bpf@...r.kernel.org,
        netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2] virtio-net: get build_skb() buf by data ptr


在 2021/6/1 上午11:08, Xuan Zhuo 写道:
> On Tue, 1 Jun 2021 11:03:37 +0800, Jason Wang <jasowang@...hat.com> wrote:
>> 在 2021/5/31 下午6:58, Xuan Zhuo 写道:
>>> On Mon, 31 May 2021 14:10:55 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>> 在 2021/5/14 下午11:16, Xuan Zhuo 写道:
>>>>> In the case of merge, the page passed into page_to_skb() may be a head
>>>>> page, not the page where the current data is located.
>>>> I don't get how this can happen?
>>>>
>>>> Maybe you can explain a little bit more?
>>>>
>>>> receive_mergeable() call page_to_skb() in two places:
>>>>
>>>> 1) XDP_PASS for linearized page , in this case we use xdp_page
>>>> 2) page_to_skb() for "normal" page, in this case the page contains the data
>>> The offset may be greater than PAGE_SIZE, because page is obtained by
>>> virt_to_head_page(), not the page where buf is located. And "offset" is the offset
>>> of buf relative to page.
>>>
>>> 	tailroom = truesize - len - offset;
>>>
>>> In this case, the tailroom must be less than 0. Although there may be enough
>>> content on this page to save skb_shared_info.
>>
>> Interesting, I think we don't use compound pages for virtio-net. (We
>> don't define SKB_FRAG_PAGE_ORDER).
>>
>> Am I wrong?
>
> It seems to me that it seems to be a fixed setting, not for us to configure
> independently


Looks like you are right.

See comments below.


>
> Thanks.
>
> ==========================================
>
> net/sock.c
>
> #define SKB_FRAG_PAGE_ORDER	get_order(32768)
> DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);
>
> /**
>   * skb_page_frag_refill - check that a page_frag contains enough room
>   * @sz: minimum size of the fragment we want to get
>   * @pfrag: pointer to page_frag
>   * @gfp: priority for memory allocation
>   *
>   * Note: While this allocator tries to use high order pages, there is
>   * no guarantee that allocations succeed. Therefore, @sz MUST be
>   * less or equal than PAGE_SIZE.
>   */
> bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
> {
> 	if (pfrag->page) {
> 		if (page_ref_count(pfrag->page) == 1) {
> 			pfrag->offset = 0;
> 			return true;
> 		}
> 		if (pfrag->offset + sz <= pfrag->size)
> 			return true;
> 		put_page(pfrag->page);
> 	}
>
> 	pfrag->offset = 0;
> 	if (SKB_FRAG_PAGE_ORDER &&
> 	    !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> 		/* Avoid direct reclaim but allow kswapd to wake */
> 		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> 					  __GFP_COMP | __GFP_NOWARN |
> 					  __GFP_NORETRY,
> 					  SKB_FRAG_PAGE_ORDER);
> 		if (likely(pfrag->page)) {
> 			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> 			return true;
> 		}
> 	}
> 	pfrag->page = alloc_page(gfp);
> 	if (likely(pfrag->page)) {
> 		pfrag->size = PAGE_SIZE;
> 		return true;
> 	}
> 	return false;
> }
> EXPORT_SYMBOL(skb_page_frag_refill);
>
>
>> Thanks
>>
>>
>>> Thanks.
>>>
>>>> Thanks
>>>>
>>>>
>>>>> So when trying to
>>>>> get the buf where the data is located, you should directly use the
>>>>> pointer(p) to get the address corresponding to the page.
>>>>>
>>>>> At the same time, the offset of the data in the page should also be
>>>>> obtained using offset_in_page().
>>>>>
>>>>> This patch solves this problem. But if you don’t use this patch, the
>>>>> original code can also run, because if the page is not the page of the
>>>>> current data, the calculated tailroom will be less than 0, and will not
>>>>> enter the logic of build_skb() . The significance of this patch is to
>>>>> modify this logical problem, allowing more situations to use
>>>>> build_skb().
>>>>>
>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
>>>>> Acked-by: Michael S. Tsirkin <mst@...hat.com>
>>>>> ---
>>>>>     drivers/net/virtio_net.c | 8 ++++++--
>>>>>     1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index 3e46c12dde08..073fec4c0df1 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -407,8 +407,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>>>     		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>>>>>     		 */
>>>>>     		truesize = PAGE_SIZE;
>>>>> -		tailroom = truesize - len - offset;
>>>>> -		buf = page_address(page);
>>>>> +
>>>>> +		/* page maybe head page, so we should get the buf by p, not the
>>>>> +		 * page
>>>>> +		 */
>>>>> +		tailroom = truesize - len - offset_in_page(p);


I wonder why offset_in_page(p) is correct? I guess it should be:

tailroom = truesize - len - headroom;

The reason is that the buffer is not necessarily allocated at the page 
boundary.

Thanks


>>>>> +		buf = (char *)((unsigned long)p & PAGE_MASK);
>>>>>     	} else {
>>>>>     		tailroom = truesize - len;
>>>>>     		buf = p;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ