linux-kernel - Re: [PATCH v2 1/2] mm: alloc_pages_bulk_noprof: drop page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e55694ce-7562-4e73-920a-8c03eb37c3c2@redhat.com>
Date: Fri, 3 Jan 2025 09:46:56 -0500
From: Luiz Capitulino <luizcap@...hat.com>
To: Mel Gorman <mgorman@...hsingularity.net>,
 Yunsheng Lin <linyunsheng@...wei.com>
Cc: linux-mm@...ck.org, willy@...radead.org, david@...hat.com,
 linux-kernel@...r.kernel.org, lcapitulino@...il.com
Subject: Re: [PATCH v2 1/2] mm: alloc_pages_bulk_noprof: drop page_list
 argument

On 2025-01-03 09:27, Mel Gorman wrote:
> On Fri, Jan 03, 2025 at 07:29:30PM +0800, Yunsheng Lin wrote:
>> On 2025/1/3 4:00, Mel Gorman wrote:
>>> On Wed, Dec 25, 2024 at 08:36:04PM +0800, Yunsheng Lin wrote:
>>>> On 2024/12/24 6:00, Luiz Capitulino wrote:
>>>>
>>>>>   /*
>>>>> - * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array
>>>>> + * __alloc_pages_bulk - Allocate a number of order-0 pages to an array
>>>>>    * @gfp: GFP flags for the allocation
>>>>>    * @preferred_nid: The preferred NUMA node ID to allocate from
>>>>>    * @nodemask: Set of nodes to allocate from, may be NULL
>>>>> - * @nr_pages: The number of pages desired on the list or array
>>>>> - * @page_list: Optional list to store the allocated pages
>>>>> - * @page_array: Optional array to store the pages
>>>>> + * @nr_pages: The number of pages desired in the array
>>>>> + * @page_array: Array to store the pages
>>>>>    *
>>>>>    * This is a batched version of the page allocator that attempts to
>>>>> - * allocate nr_pages quickly. Pages are added to page_list if page_list
>>>>> - * is not NULL, otherwise it is assumed that the page_array is valid.
>>>>> + * allocate nr_pages quickly. Pages are added to the page_array.
>>>>>    *
>>>>> - * For lists, nr_pages is the number of pages that should be allocated.
>>>>> - *
>>>>> - * For arrays, only NULL elements are populated with pages and nr_pages
>>>>> + * Note that only NULL elements are populated with pages and nr_pages
>>>>
>>>> It is not really related to this patch, but while we are at this, the above
>>>> seems like an odd behavior. By roughly looking at all the callers of that
>>>> API, it seems like only the below callers rely on that?
>>>> fs/erofs/zutil.c: z_erofs_gbuf_growsize()
>>>> fs/xfs/xfs_buf.c: xfs_buf_alloc_pages()
>>>>
>>>> It seems it is quite straight forward to change the above callers to not
>>>> rely on the above behavior, and we might be able to avoid more checking
>>>> by removing the above behavior?
>>>>
>>>
>>> It was implemented that way for an early user, net/sunrpc/svc_xprt.c.
>>> The behaviour removes a burden from the caller to track the number of
>>> populated elements and then pass the exact number of pages that must be
>>> allocated. If the API does not handle that detail, each caller needs
>>> similar state tracking implementations. As the overhead is going to be
>>> the same whether the API implements it once or each caller implements
>>> there own, it is simplier if there is just one implementation.
>>
>> It seems it is quite straight forward to change the above use case to
>> not rely on that by something like below?
>>
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index 43c57124de52..52800bfddc86 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -670,19 +670,21 @@ static bool svc_alloc_arg(struct svc_rqst *rqstp)
>>                  pages = RPCSVC_MAXPAGES;
>>          }
>>
>> -       for (filled = 0; filled < pages; filled = ret) {
>> -               ret = alloc_pages_bulk_array(GFP_KERNEL, pages,
>> -                                            rqstp->rq_pages);
>> -               if (ret > filled)
>> +       for (filled = 0; filled < pages;) {
>> +               ret = alloc_pages_bulk_array(GFP_KERNEL, pages - filled,
>> +                                            rqstp->rq_pages + filled);
>> +               if (ret) {
>> +                       filled += ret;
>>                          /* Made progress, don't sleep yet */
>>                          continue;
>> +               }
>>
>>                  set_current_state(TASK_IDLE);
>>                  if (svc_thread_should_stop(rqstp)) {
>>                          set_current_state(TASK_RUNNING);
>>                          return false;
>>                  }
>> -               trace_svc_alloc_arg_err(pages, ret);
>> +               trace_svc_alloc_arg_err(pages, filled);
>>                  memalloc_retry_wait(GFP_KERNEL);
>>          }
>>          rqstp->rq_page_end = &rqstp->rq_pages[pages];
>>
> 
> The API implementation would also need to change to make this work as the
> return value is a number of pages that are on the array, not the number of
> new pages allocated. Even if fixed, it still moves cost and complexity to
> the caller and the API is harder to use and easier to make mistakes. That
> shift in responsibility and the maintenance burden would need to be
> justified. While it is possible to use wrappers to allow callers to decide
> whether to manage the tracking or let the API handle it, the requirement
> then is to show that there is a performance gain for a common use case.

I agree with all your points, but I think there's value in simplifying
alloc_pages_bulk_noprof() if the wrappers turn out not be complex or
difficult to use.

If all users of this case always fill the array sequentially, then
maybe we could just have a wrapper that scans the array first (this
is the first loop in alloc_pages_bulk_noprof()) or maybe callers
could keep track of 'filled' (via a pointer or macro usage). But
this is just brainstorming, we'd need to see the resulting code to
know if it works and makes sense.

> This is outside the scope of a serise that removes the page_list
> argument. Even if a series was proposed to shift responsibility to the
> caller, I would not expect it to be submitted with a page_list removal.

Yes, absolutely.