[<prev] [next>] [day] [month] [year] [list]
Message-ID: <24aee726-83a9-9741-23bb-1f8be0e775e6@redhat.com>
Date: Tue, 1 Jun 2021 14:17:27 +0800
From: Jason Wang <jasowang@...hat.com>
To: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
virtualization@...ts.linux-foundation.org, bpf@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2] virtio-net: get build_skb() buf by data ptr
在 2021/6/1 下午1:59, Xuan Zhuo 写道:
> On Tue, 1 Jun 2021 11:27:12 +0800, Jason Wang <jasowang@...hat.com> wrote:
>> 在 2021/6/1 上午11:08, Xuan Zhuo 写道:
>>> On Tue, 1 Jun 2021 11:03:37 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>> 在 2021/5/31 下午6:58, Xuan Zhuo 写道:
>>>>> On Mon, 31 May 2021 14:10:55 +0800, Jason Wang <jasowang@...hat.com> wrote:
>>>>>> 在 2021/5/14 下午11:16, Xuan Zhuo 写道:
>>>>>>> In the case of merge, the page passed into page_to_skb() may be a head
>>>>>>> page, not the page where the current data is located.
>>>>>> I don't get how this can happen?
>>>>>>
>>>>>> Maybe you can explain a little bit more?
>>>>>>
>>>>>> receive_mergeable() call page_to_skb() in two places:
>>>>>>
>>>>>> 1) XDP_PASS for linearized page , in this case we use xdp_page
>>>>>> 2) page_to_skb() for "normal" page, in this case the page contains the data
>>>>> The offset may be greater than PAGE_SIZE, because page is obtained by
>>>>> virt_to_head_page(), not the page where buf is located. And "offset" is the offset
>>>>> of buf relative to page.
>>>>>
>>>>> tailroom = truesize - len - offset;
>>>>>
>>>>> In this case, the tailroom must be less than 0. Although there may be enough
>>>>> content on this page to save skb_shared_info.
>>>> Interesting, I think we don't use compound pages for virtio-net. (We
>>>> don't define SKB_FRAG_PAGE_ORDER).
>>>>
>>>> Am I wrong?
>>> It seems to me that it seems to be a fixed setting, not for us to configure
>>> independently
>>
>> Looks like you are right.
>>
>> See comments below.
>>
>>
>>> Thanks.
>>>
>>> ==========================================
>>>
>>> net/sock.c
>>>
>>> #define SKB_FRAG_PAGE_ORDER get_order(32768)
>>> DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);
>>>
>>> /**
>>> * skb_page_frag_refill - check that a page_frag contains enough room
>>> * @sz: minimum size of the fragment we want to get
>>> * @pfrag: pointer to page_frag
>>> * @gfp: priority for memory allocation
>>> *
>>> * Note: While this allocator tries to use high order pages, there is
>>> * no guarantee that allocations succeed. Therefore, @sz MUST be
>>> * less or equal than PAGE_SIZE.
>>> */
>>> bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
>>> {
>>> if (pfrag->page) {
>>> if (page_ref_count(pfrag->page) == 1) {
>>> pfrag->offset = 0;
>>> return true;
>>> }
>>> if (pfrag->offset + sz <= pfrag->size)
>>> return true;
>>> put_page(pfrag->page);
>>> }
>>>
>>> pfrag->offset = 0;
>>> if (SKB_FRAG_PAGE_ORDER &&
>>> !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
>>> /* Avoid direct reclaim but allow kswapd to wake */
>>> pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>>> __GFP_COMP | __GFP_NOWARN |
>>> __GFP_NORETRY,
>>> SKB_FRAG_PAGE_ORDER);
>>> if (likely(pfrag->page)) {
>>> pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
>>> return true;
>>> }
>>> }
>>> pfrag->page = alloc_page(gfp);
>>> if (likely(pfrag->page)) {
>>> pfrag->size = PAGE_SIZE;
>>> return true;
>>> }
>>> return false;
>>> }
>>> EXPORT_SYMBOL(skb_page_frag_refill);
>>>
>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks.
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> So when trying to
>>>>>>> get the buf where the data is located, you should directly use the
>>>>>>> pointer(p) to get the address corresponding to the page.
>>>>>>>
>>>>>>> At the same time, the offset of the data in the page should also be
>>>>>>> obtained using offset_in_page().
>>>>>>>
>>>>>>> This patch solves this problem. But if you don’t use this patch, the
>>>>>>> original code can also run, because if the page is not the page of the
>>>>>>> current data, the calculated tailroom will be less than 0, and will not
>>>>>>> enter the logic of build_skb() . The significance of this patch is to
>>>>>>> modify this logical problem, allowing more situations to use
>>>>>>> build_skb().
>>>>>>>
>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
>>>>>>> Acked-by: Michael S. Tsirkin <mst@...hat.com>
>>>>>>> ---
>>>>>>> drivers/net/virtio_net.c | 8 ++++++--
>>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index 3e46c12dde08..073fec4c0df1 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -407,8 +407,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>>>>> * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>>>>>>> */
>>>>>>> truesize = PAGE_SIZE;
>>>>>>> - tailroom = truesize - len - offset;
>>>>>>> - buf = page_address(page);
>>>>>>> +
>>>>>>> + /* page maybe head page, so we should get the buf by p, not the
>>>>>>> + * page
>>>>>>> + */
>>>>>>> + tailroom = truesize - len - offset_in_page(p);
>>
>> I wonder why offset_in_page(p) is correct? I guess it should be:
>>
>> tailroom = truesize - len - headroom;
>>
>> The reason is that the buffer is not necessarily allocated at the page
>> boundary.
> In my understanding, offset_in_page(p) is the offset of p in the page where it
> is located. In this case, the two should be equal.
I think not, if the frag is not page aligned. offset_in_page(p) doesn't
equal to headroom.
Consider the case that the frag start from page offset 1500.
> This has nothing to do with
> which page is allocated.
>
> Of course I think using headroom is a good idea, and it is semantically better.
>
> Thanks.
Thanks
>
>> Thanks
>>
>>
>>>>>>> + buf = (char *)((unsigned long)p & PAGE_MASK);
>>>>>>> } else {
>>>>>>> tailroom = truesize - len;
>>>>>>> buf = p;
Powered by blists - more mailing lists