[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <74827bc7-ec6e-4e3a-9d19-61c4a9ba6b2c@huawei.com>
Date: Tue, 4 Mar 2025 20:04:03 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Chuck Lever <chuck.lever@...cle.com>, Yishai Hadas <yishaih@...dia.com>,
Jason Gunthorpe <jgg@...pe.ca>, Shameer Kolothum
<shameerali.kolothum.thodi@...wei.com>, Kevin Tian <kevin.tian@...el.com>,
Alex Williamson <alex.williamson@...hat.com>, Chris Mason <clm@...com>, Josef
Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>, Gao Xiang
<xiang@...nel.org>, Chao Yu <chao@...nel.org>, Yue Hu <zbestahu@...il.com>,
Jeffle Xu <jefflexu@...ux.alibaba.com>, Sandeep Dhavale <dhavale@...gle.com>,
Carlos Maiolino <cem@...nel.org>, "Darrick J. Wong" <djwong@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>, Jesper Dangaard Brouer
<hawk@...nel.org>, Ilias Apalodimas <ilias.apalodimas@...aro.org>, "David S.
Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
<horms@...nel.org>, Trond Myklebust <trondmy@...nel.org>, Anna Schumaker
<anna@...nel.org>, Jeff Layton <jlayton@...nel.org>, Neil Brown
<neilb@...e.de>, Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo
<Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>
CC: Luiz Capitulino <luizcap@...hat.com>, Mel Gorman
<mgorman@...hsingularity.net>, Dave Chinner <david@...morbit.com>,
<kvm@...r.kernel.org>, <virtualization@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>, <linux-btrfs@...r.kernel.org>,
<linux-erofs@...ts.ozlabs.org>, <linux-xfs@...r.kernel.org>,
<linux-mm@...ck.org>, <netdev@...r.kernel.org>, <linux-nfs@...r.kernel.org>,
Suren Baghdasaryan <surenb@...gle.com>
Subject: Re: [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating
only NULL elements
On 2025/3/4 6:13, Chuck Lever wrote:
> On 2/28/25 4:44 AM, Yunsheng Lin wrote:
>> As mentioned in [1], it seems odd to check NULL elements in
>> the middle of page bulk allocating, and it seems caller can
>> do a better job of bulk allocating pages into a whole array
>> sequentially without checking NULL elements first before
>> doing the page bulk allocation for most of existing users.
>
> Sorry, but this still makes a claim without providing any data
> to back it up. Why can callers "do a better job"?
What I meant "do a better job" is that callers are already keeping
track of how many pages have been allocated, and it seems convenient
to just pass 'page_array + nr_allocated' and 'nr_pages - nr_allocated'
when calling subsequent page bulk alloc API so that NULL checking
can be avoided, which seems to be the pattern I see in
alloc_pages_bulk_interleave().
>
>
>> Through analyzing of bulk allocation API used in fs, it
>> seems that the callers are depending on the assumption of
>> populating only NULL elements in fs/btrfs/extent_io.c and
>> net/sunrpc/svc_xprt.c while erofs and btrfs don't, see:
>> commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator")
>> commit d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()")
>> commit c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers")
>> commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
>>
>> Change SUNRPC and btrfs to not depend on the assumption.
>> Other existing callers seems to be passing all NULL elements
>> via memset, kzalloc, etc.
>>
>> Remove assumption of populating only NULL elements and treat
>> page_array as output parameter like kmem_cache_alloc_bulk().
>> Remove the above assumption also enable the caller to not
>> zero the array before calling the page bulk allocating API,
>> which has about 1~2 ns performance improvement for the test
>> case of time_bench_page_pool03_slow() for page_pool in a
>> x86 vm system, this reduces some performance impact of
>> fixing the DMA API misuse problem in [2], performance
>> improves from 87.886 ns to 86.429 ns.
>>
>> 1. https://lore.kernel.org/all/bd8c2f5c-464d-44ab-b607-390a87ea4cd5@huawei.com/
>> 2. https://lore.kernel.org/all/20250212092552.1779679-1-linyunsheng@huawei.com/
>> CC: Jesper Dangaard Brouer <hawk@...nel.org>
>> CC: Luiz Capitulino <luizcap@...hat.com>
>> CC: Mel Gorman <mgorman@...hsingularity.net>
>> CC: Dave Chinner <david@...morbit.com>
>> CC: Chuck Lever <chuck.lever@...cle.com>
>> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
>> Acked-by: Jeff Layton <jlayton@...nel.org>
>> ---
>> V2:
>> 1. Drop RFC tag and rebased on latest linux-next.
>> 2. Fix a compile error for xfs.
>> 3. Defragmemt the page_array for SUNRPC and btrfs.
>> ---
>> drivers/vfio/pci/virtio/migrate.c | 2 --
>> fs/btrfs/extent_io.c | 23 +++++++++++++++++-----
>> fs/erofs/zutil.c | 12 ++++++------
>> fs/xfs/xfs_buf.c | 9 +++++----
>> mm/page_alloc.c | 32 +++++--------------------------
>> net/core/page_pool.c | 3 ---
>> net/sunrpc/svc_xprt.c | 22 +++++++++++++++++----
>> 7 files changed, 52 insertions(+), 51 deletions(-)
>
> 52:51 is not an improvement. 1-2 ns is barely worth mentioning. The
> sunrpc and btrfs callers are more complex and carry duplicated code.
Yes, the hard part is to find common file to place the common function
as something as below:
void defragment_pointer_array(void** array, int size) {
int slow = 0;
for (int fast = 0; fast < size; fast++) {
if (array[fast] != NULL) {
swap(&array[fast], &array[slow]);
slow++;
}
}
}
Or introduce a function as something like alloc_pages_refill_array()
for the usecase of sunrpc and xfs and do the array defragment in
alloc_pages_refill_array() before calling alloc_pages_bulk()?
Any suggestion?
>
> Not an outright objection from me, but it's hard to justify this change.
The above should reduce the number to something like 40:51.
Also, I looked more closely at other callers calling alloc_pages_bulk(),
it seems vm_module_tags_populate() doesn't clear next_page[i] to NULL after
__free_page() when doing 'Clean up and error out', I am not sure if
vm_module_tags_populate() will be called multi-times as vm_module_tags->pages
seems to be only set to all NULL once, see alloc_tag_init() -> alloc_mod_tags_mem().
Cc Suren to see if there is some clarifying for the above.
Powered by blists - more mailing lists