[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5905bad4-8a98-4f9d-9bd6-b9764e299ac7@huawei.com>
Date: Fri, 16 Aug 2024 20:01:18 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Andrew Morton
<akpm@...ux-foundation.org>, <linux-mm@...ck.org>
Subject: Re: [PATCH net-next v13 11/14] mm: page_frag: introduce
prepare/probe/commit API
On 2024/8/15 23:25, Alexander Duyck wrote:
> On Wed, Aug 14, 2024 at 8:05 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>
>> On 2024/8/15 5:00, Alexander H Duyck wrote:
>
> ...
>
>>>> +static inline void page_frag_alloc_abort(struct page_frag_cache *nc,
>>>> + unsigned int fragsz)
>>>> +{
>>>> + nc->pagecnt_bias++;
>>>> + nc->remaining += fragsz;
>>>> +}
>>>> +
>>>
>>> This doesn't add up. Why would you need abort if you have commit? Isn't
>>> this more of a revert? I wouldn't think that would be valid as it is
>>> possible you took some sort of action that might have resulted in this
>>> memory already being shared. We shouldn't allow rewinding the offset
>>> pointer without knowing that there are no other entities sharing the
>>> page.
>>
>> This is used for __tun_build_skb() in drivers/net/tun.c as below, mainly
>> used to avoid performance penalty for XDP drop case:
>
> Yeah, I reviewed that patch. As I said there, rewinding the offset
> should be avoided unless you can verify you are the only owner of the
> page as you have no guarantees that somebody else didn't take an
> access to the page/data to send it off somewhere else. Once you expose
> the page to any other entity it should be written off or committed in
> your case and you should move on to the next block.
Yes, the expectation is that somebody else didn't take an access to the
page/data to send it off somewhere else between page_frag_alloc_va()
and page_frag_alloc_abort(), did you see expectation was broken in that
patch? If yes, we should fix that by using page_frag_free_va() related
API instead of using page_frag_alloc_abort().
>
>
>>
>>>> +static struct page *__page_frag_cache_reload(struct page_frag_cache *nc,
>>>> + gfp_t gfp_mask)
>>>> {
>>>> + struct page *page;
>>>> +
>>>> if (likely(nc->encoded_va)) {
>>>> - if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias))
>>>> + page = __page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias);
>>>> + if (page)
>>>> goto out;
>>>> }
>>>>
>>>> - if (unlikely(!__page_frag_cache_refill(nc, gfp_mask)))
>>>> - return false;
>>>> + page = __page_frag_cache_refill(nc, gfp_mask);
>>>> + if (unlikely(!page))
>>>> + return NULL;
>>>>
>>>> out:
>>>> /* reset page count bias and remaining to start of new frag */
>>>> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>>>> nc->remaining = page_frag_cache_page_size(nc->encoded_va);
>>>> - return true;
>>>> + return page;
>>>> +}
>>>> +
>>>
>>> None of the functions above need to be returning page.
>>
>> Are you still suggesting to always use virt_to_page() even when it is
>> not really necessary? why not return the page here to avoid the
>> virt_to_page()?
>
> Yes. The likelihood of you needing to pass this out as a page should
> be low as most cases will just be you using the virtual address
> anyway. You are essentially trading off branching for not having to
> use virt_to_page. It is unnecessary optimization.
As my understanding, I am not trading off branching for not having to
use virt_to_page, the branching is already needed no matter we utilize
it to avoid calling virt_to_page() or not, please be more specific about
which branching is traded off for not having to use virt_to_page() here.
>
>
>>
>>>> +struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
>>>> + unsigned int *offset, unsigned int fragsz,
>>>> + gfp_t gfp)
>>>> +{
>>>> + unsigned int remaining = nc->remaining;
>>>> + struct page *page;
>>>> +
>>>> + VM_BUG_ON(!fragsz);
>>>> + if (likely(remaining >= fragsz)) {
>>>> + unsigned long encoded_va = nc->encoded_va;
>>>> +
>>>> + *offset = page_frag_cache_page_size(encoded_va) -
>>>> + remaining;
>>>> +
>>>> + return virt_to_page((void *)encoded_va);
>>>> + }
>>>> +
>>>> + if (unlikely(fragsz > PAGE_SIZE))
>>>> + return NULL;
>>>> +
>>>> + page = __page_frag_cache_reload(nc, gfp);
>>>> + if (unlikely(!page))
>>>> + return NULL;
>>>> +
>>>> + *offset = 0;
>>>> + nc->remaining = remaining - fragsz;
>>>> + nc->pagecnt_bias--;
>>>> +
>>>> + return page;
>>>> }
>>>> +EXPORT_SYMBOL(page_frag_alloc_pg);
>>>
>>> Again, this isn't returning a page. It is essentially returning a
>>> bio_vec without calling it as such. You might as well pass the bio_vec
>>> pointer as an argument and just have it populate it directly.
>>
>> I really don't think your bio_vec suggestion make much sense for now as
>> the reason mentioned in below:
>>
>> "Through a quick look, there seems to be at least three structs which have
>> similar values: struct bio_vec & struct skb_frag & struct page_frag.
>>
>> As your above agrument about using bio_vec, it seems it is ok to use any
>> one of them as each one of them seems to have almost all the values we
>> are using?
>>
>> Personally, my preference over them: 'struct page_frag' > 'struct skb_frag'
>>> 'struct bio_vec', as the naming of 'struct page_frag' seems to best match
>> the page_frag API, 'struct skb_frag' is the second preference because we
>> mostly need to fill skb frag anyway, and 'struct bio_vec' is the last
>> preference because it just happen to have almost all the values needed.
>
> That is why I said I would be okay with us passing page_frag in patch
> 12 after looking closer at the code. The fact is it should make the
> review of that patch set much easier if you essentially just pass the
> page_frag back out of the call. Then it could be used in exactly the
> same way it was before and should reduce the total number of lines of
> code that need to be changed.
So the your suggestion changed to something like below?
int page_frag_alloc_pfrag(struct page_frag_cache *nc, struct page_frag *pfrag);
The API naming of 'page_frag_alloc_pfrag' seems a little odd to me, any better
one in your mind?
>
>> Is there any specific reason other than the above "almost all the values you
>> are using are exposed by that structure already " that you prefer bio_vec?"
>>
>> 1. https://lore.kernel.org/all/ca6be29e-ab53-4673-9624-90d41616a154@huawei.com/
>
> My reason for preferring bio_vec is that of the 3 it is the most setup
> to be used as a local variable versus something stored in a struct
> such as page_frag or used for some specialty user case such as
> skb_frag_t. In addition it already has a set of helpers for converting
> it to a virtual address or copying data to and from it which would
> make it easier to get rid of a bunch of duplicate code.
Powered by blists - more mailing lists