[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a0e12c1-0e98-426a-ab4d-50de2b09f36f@huawei.com>
Date: Tue, 23 Jul 2024 21:19:46 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Andrew Morton
<akpm@...ux-foundation.org>, <linux-mm@...ck.org>
Subject: Re: [RFC v11 08/14] mm: page_frag: some minor refactoring before
adding new API
On 2024/7/22 23:32, Alexander Duyck wrote:
> On Mon, Jul 22, 2024 at 5:55 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>
>> On 2024/7/22 7:40, Alexander H Duyck wrote:
>>> On Fri, 2024-07-19 at 17:33 +0800, Yunsheng Lin wrote:
>>>> Refactor common codes from __page_frag_alloc_va_align()
>>>> to __page_frag_cache_refill(), so that the new API can
>>>> make use of them.
>>>>
>>>> CC: Alexander Duyck <alexander.duyck@...il.com>
>>>> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
>>>> ---
>>>> include/linux/page_frag_cache.h | 2 +-
>>>> mm/page_frag_cache.c | 93 +++++++++++++++++----------------
>>>> 2 files changed, 49 insertions(+), 46 deletions(-)
>>>>
>>>> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
>>>> index 12a16f8e8ad0..5aa45de7a9a5 100644
>>>> --- a/include/linux/page_frag_cache.h
>>>> +++ b/include/linux/page_frag_cache.h
>>>> @@ -50,7 +50,7 @@ static inline void *encoded_page_address(unsigned long encoded_va)
>>>>
>>>> static inline void page_frag_cache_init(struct page_frag_cache *nc)
>>>> {
>>>> - nc->encoded_va = 0;
>>>> + memset(nc, 0, sizeof(*nc));
>>>> }
>>>>
>>>
>>> I do not like requiring the entire structure to be reset as a part of
>>> init. If encoded_va is 0 then we have reset the page and the flags.
>>> There shouldn't be anything else we need to reset as remaining and bias
>>> will be reset when we reallocate.
>>
>> The argument is about aoviding one checking for fast path by doing the
>> memset in the slow path, which you might already know accroding to your
>> comment in previous version.
>>
>> It is just sometimes hard to understand your preference for maintainability
>> over performance here as sometimes your comment seems to perfer performance
>> over maintainability, like the LEA trick you mentioned and offset count-down
>> before this patchset. It would be good to be more consistent about this,
>> otherwise it is sometimes confusing when doing the refactoring.
>
> The use of a negative offset is arguably more maintainable in my mind
> rather than being a performance trick. Essentially if you use the
> negative value you can just mask off the upper bits and it is the
> offset in the page. As such it is actually easier for me to read
> versus "remaining" which is an offset from the end of the page.
> Assuming you read the offset in hex anyway.
Reading the above doesn't seems maintainable to me:(
>
>>>
>>>> static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc)
>>>> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
>>>> index 7928e5d50711..d9c9cad17af7 100644
>>>> --- a/mm/page_frag_cache.c
>>>> +++ b/mm/page_frag_cache.c
>>>> @@ -19,6 +19,28 @@
>>>> #include <linux/page_frag_cache.h>
>>>> #include "internal.h"
>>>>
>>>> +static struct page *__page_frag_cache_recharge(struct page_frag_cache *nc)
>>>> +{
>>>> + unsigned long encoded_va = nc->encoded_va;
>>>> + struct page *page;
>>>> +
>>>> + page = virt_to_page((void *)encoded_va);
>>>> + if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
>>>> + return NULL;
>>>> +
>>>> + if (unlikely(encoded_page_pfmemalloc(encoded_va))) {
>>>> + VM_BUG_ON(compound_order(page) !=
>>>> + encoded_page_order(encoded_va));
>>>> + free_unref_page(page, encoded_page_order(encoded_va));
>>>> + return NULL;
>>>> + }
>>>> +
>>>> + /* OK, page count is 0, we can safely set it */
>>>> + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>>>> +
>>>> + return page;
>>>> +}
>>>> +
>>>> static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>>> gfp_t gfp_mask)
>>>> {
>>>> @@ -26,6 +48,14 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>>> struct page *page = NULL;
>>>> gfp_t gfp = gfp_mask;
>>>>
>>>> + if (likely(nc->encoded_va)) {
>>>> + page = __page_frag_cache_recharge(nc);
>>>> + if (page) {
>>>> + order = encoded_page_order(nc->encoded_va);
>>>> + goto out;
>>>> + }
>>>> + }
>>>> +
>>>
>>> This code has no business here. This is refill, you just dropped
>>> recharge in here which will make a complete mess of the ordering and be
>>> confusing to say the least.
>>>
>>> The expectation was that if we are calling this function it is going to
>>> overwrite the virtual address to NULL on failure so we discard the old
>>> page if there is one present. This changes that behaviour. What you
>>> effectively did is made __page_frag_cache_refill into the recharge
>>> function.
>>
>> The idea is to reuse the below for both __page_frag_cache_refill() and
>> __page_frag_cache_recharge(), which seems to be about maintainability
>> to not having duplicated code. If there is a better idea to avoid that
>> duplicated code while keeping the old behaviour, I am happy to change
>> it.
>>
>> /* reset page count bias and remaining to start of new frag */
>> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>> nc->remaining = PAGE_SIZE << order;
>>
>
> The only piece that is really reused here is the pagecnt_bias
> assignment. What is obfuscated away is that the order is gotten
> through one of two paths. Really order isn't order here it is size.
> Which should have been fetched already. What you end up doing with
> this change is duplicating a bunch of code throughout the function.
> You end up having to fetch size multiple times multiple ways. here you
> are generating it with order. Then you have to turn around and get it
> again at the start of the function, and again after calling this
> function in order to pull it back out.
I am assuming you would like to reserve old behavior as below?
if(!encoded_va) {
refill:
__page_frag_cache_refill()
}
if(remaining < fragsz) {
if(!__page_frag_cache_recharge())
goto refill;
}
As we are adding new APIs, are we expecting new APIs also duplicate
the above pattern?
>
>>>
>>>> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>>>> gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP |
>>>> __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>>>> @@ -35,7 +65,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>>> if (unlikely(!page)) {
>>>> page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
>>>> if (unlikely(!page)) {
>>>> - nc->encoded_va = 0;
>>>> + memset(nc, 0, sizeof(*nc));
>>>> return NULL;
>>>> }
>>>>
>>>
>>> The memset will take a few more instructions than the existing code
>>> did. I would prefer to keep this as is if at all possible.
>>
>> It will not take more instructions for arm64 as it has 'stp' instruction for
>> __HAVE_ARCH_MEMSET is set.
>> There is something similar for x64?
>
> The x64 does not last I knew without getting into the SSE/AVX type
> stuff. This becomes two seperate 8B store instructions.
I can check later when I get hold of a x64 server.
But doesn't it make sense to have one extra 8B store instructions in slow path
to avoid a checking in fast path?
>
>>>
>>>> @@ -45,6 +75,16 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>>> nc->encoded_va = encode_aligned_va(page_address(page), order,
>>>> page_is_pfmemalloc(page));
>>>>
>>>> + /* Even if we own the page, we do not use atomic_set().
>>>> + * This would break get_page_unless_zero() users.
>>>> + */
>>>> + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
>>>> +
>>>> +out:
>>>> + /* reset page count bias and remaining to start of new frag */
>>>> + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>>>> + nc->remaining = PAGE_SIZE << order;
>>>> +
>>>> return page;
>>>> }
>>>>
>>>
>>> Why bother returning a page at all? It doesn't seem like you don't use
>>> it anymore. It looks like the use cases you have for it in patch 11/12
>>> all appear to be broken from what I can tell as you are adding page as
>>> a variable when we don't need to be passing internal details to the
>>> callers of the function when just a simple error return code would do.
>>
>> It would be good to be more specific about the 'broken' part here.
>
> We are passing internals to the caller. Basically this is generally
> frowned upon for many implementations of things as the general idea is
> that the internal page we are using should be a pseudo-private value.
It is implementation detail and it is about avoid calling virt_to_page()
as mentioned below, I am not sure why it is referred as 'broken', it would
be better to provide more doc about why it is bad idea here, as using
'pseudo-private ' wording doesn't seems to justify the 'broken' part here.
> I understand that you have one or two callers that need it for the use
> cases you have in patches 11/12, but it also seems like you are just
> passing it regardless. For example I noticed in a few cases you added
> the page pointer in 12 to handle the return value, but then just used
> it to check for NULL. My thought would be that rather than returning
> the page here you would be better off just returning 0 or an error and
> then doing the virt_to_page translation for all the cases where the
> page is actually needed since you have to go that route for a cached
> page anyway.
Yes, it is about aovid calling virt_to_page() as much as possible.
Powered by blists - more mailing lists