linux-kernel - Re: [PATCH v2 1/2] mm: alloc_pages_bulk_noprof: drop page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <rat5ihytbh4r7w476ezwxrzjzloilqygpb3dgjg7ewwmm7og2s@6osaxcvgd7ys>
Date: Fri, 3 Jan 2025 14:27:02 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: Luiz Capitulino <luizcap@...hat.com>, linux-mm@...ck.org, 
	willy@...radead.org, david@...hat.com, linux-kernel@...r.kernel.org, 
	lcapitulino@...il.com
Subject: Re: [PATCH v2 1/2] mm: alloc_pages_bulk_noprof: drop page_list
 argument

On Fri, Jan 03, 2025 at 07:29:30PM +0800, Yunsheng Lin wrote:
> On 2025/1/3 4:00, Mel Gorman wrote:
> > On Wed, Dec 25, 2024 at 08:36:04PM +0800, Yunsheng Lin wrote:
> >> On 2024/12/24 6:00, Luiz Capitulino wrote:
> >>
> >>>  /*
> >>> - * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array
> >>> + * __alloc_pages_bulk - Allocate a number of order-0 pages to an array
> >>>   * @gfp: GFP flags for the allocation
> >>>   * @preferred_nid: The preferred NUMA node ID to allocate from
> >>>   * @nodemask: Set of nodes to allocate from, may be NULL
> >>> - * @nr_pages: The number of pages desired on the list or array
> >>> - * @page_list: Optional list to store the allocated pages
> >>> - * @page_array: Optional array to store the pages
> >>> + * @nr_pages: The number of pages desired in the array
> >>> + * @page_array: Array to store the pages
> >>>   *
> >>>   * This is a batched version of the page allocator that attempts to
> >>> - * allocate nr_pages quickly. Pages are added to page_list if page_list
> >>> - * is not NULL, otherwise it is assumed that the page_array is valid.
> >>> + * allocate nr_pages quickly. Pages are added to the page_array.
> >>>   *
> >>> - * For lists, nr_pages is the number of pages that should be allocated.
> >>> - *
> >>> - * For arrays, only NULL elements are populated with pages and nr_pages
> >>> + * Note that only NULL elements are populated with pages and nr_pages
> >>
> >> It is not really related to this patch, but while we are at this, the above
> >> seems like an odd behavior. By roughly looking at all the callers of that
> >> API, it seems like only the below callers rely on that?
> >> fs/erofs/zutil.c: z_erofs_gbuf_growsize()
> >> fs/xfs/xfs_buf.c: xfs_buf_alloc_pages()
> >>
> >> It seems it is quite straight forward to change the above callers to not
> >> rely on the above behavior, and we might be able to avoid more checking
> >> by removing the above behavior?
> >>
> > 
> > It was implemented that way for an early user, net/sunrpc/svc_xprt.c.
> > The behaviour removes a burden from the caller to track the number of
> > populated elements and then pass the exact number of pages that must be
> > allocated. If the API does not handle that detail, each caller needs
> > similar state tracking implementations. As the overhead is going to be
> > the same whether the API implements it once or each caller implements
> > there own, it is simplier if there is just one implementation.
> 
> It seems it is quite straight forward to change the above use case to
> not rely on that by something like below?
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 43c57124de52..52800bfddc86 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -670,19 +670,21 @@ static bool svc_alloc_arg(struct svc_rqst *rqstp)
>                 pages = RPCSVC_MAXPAGES;
>         }
> 
> -       for (filled = 0; filled < pages; filled = ret) {
> -               ret = alloc_pages_bulk_array(GFP_KERNEL, pages,
> -                                            rqstp->rq_pages);
> -               if (ret > filled)
> +       for (filled = 0; filled < pages;) {
> +               ret = alloc_pages_bulk_array(GFP_KERNEL, pages - filled,
> +                                            rqstp->rq_pages + filled);
> +               if (ret) {
> +                       filled += ret;
>                         /* Made progress, don't sleep yet */
>                         continue;
> +               }
> 
>                 set_current_state(TASK_IDLE);
>                 if (svc_thread_should_stop(rqstp)) {
>                         set_current_state(TASK_RUNNING);
>                         return false;
>                 }
> -               trace_svc_alloc_arg_err(pages, ret);
> +               trace_svc_alloc_arg_err(pages, filled);
>                 memalloc_retry_wait(GFP_KERNEL);
>         }
>         rqstp->rq_page_end = &rqstp->rq_pages[pages];
> 

The API implementation would also need to change to make this work as the
return value is a number of pages that are on the array, not the number of
new pages allocated. Even if fixed, it still moves cost and complexity to
the caller and the API is harder to use and easier to make mistakes. That
shift in responsibility and the maintenance burden would need to be
justified. While it is possible to use wrappers to allow callers to decide
whether to manage the tracking or let the API handle it, the requirement
then is to show that there is a performance gain for a common use case.

This is outside the scope of a serise that removes the page_list
argument. Even if a series was proposed to shift responsibility to the
caller, I would not expect it to be submitted with a page_list removal.

-- 
Mel Gorman
SUSE Labs