[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ffeb7915-a028-40d8-94d0-4c647ee8e184@uls.co.za>
Date: Mon, 31 Mar 2025 22:43:42 +0200
From: Jaco Kroon <jaco@....co.za>
To: Joanne Koong <joannelkoong@...il.com>
Cc: bernd.schubert@...tmail.fm, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, miklos@...redi.hu, rdunlap@...radead.org,
trapexit@...wn.link
Subject: Re: [PATCH 2/2] fuse: Adjust readdir() buffer to requesting buffer
size.
Hi,
On 2025/03/31 18:41, Joanne Koong wrote:
> On Fri, Mar 14, 2025 at 3:39 PM Jaco Kroon<jaco@....co.za> wrote:
>> Clamp to min 1 page (4KB) and max 128 pages (512KB).
>>
>> Glusterfs trial using strace ls -l.
>>
>> Before:
>>
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 616
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 624
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 624
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 608
>> getdents64(3, 0x7f2d7d7a7040 /* 1 entries */, 131072) = 24
>> getdents64(3, 0x7f2d7d7a7040 /* 0 entries */, 131072) = 0
>>
>> After:
>>
>> getdents64(3, 0x7ffae8eed040 /* 276 entries */, 131072) = 6696
>> getdents64(3, 0x7ffae8eed040 /* 0 entries */, 131072) = 0
>>
>> Signed-off-by: Jaco Kroon<jaco@....co.za>
>> ---
>> fs/fuse/readdir.c | 22 ++++++++++++++++++----
>> 1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
>> index 17ce9636a2b1..a0ccbc84b000 100644
>> --- a/fs/fuse/readdir.c
>> +++ b/fs/fuse/readdir.c
>> @@ -337,11 +337,25 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
>> struct fuse_mount *fm = get_fuse_mount(inode);
>> struct fuse_io_args ia = {};
>> struct fuse_args_pages *ap = &ia.ap;
>> - struct fuse_folio_desc desc = { .length = PAGE_SIZE };
>> + struct fuse_folio_desc desc = { .length = ctx->count };
>> u64 attr_version = 0, evict_ctr = 0;
>> bool locked;
>> + int order;
>>
>> - folio = folio_alloc(GFP_KERNEL, 0);
>> + if (desc.length < PAGE_SIZE)
>> + desc.length = PAGE_SIZE;
>> + else if (desc.length > (PAGE_SIZE << 7)) /* 128 pages, typically 512KB */
>> + desc.length = PAGE_SIZE << 7;
>> +
> Just wondering, how did 128 pages get decided as the upper bound? It
> seems to me to make more sense if the upper bound is fc->max_pages.
Best answer ... random/guess at something which may be sensible.
> Also btw, I think you can just use the clamp() helper from
> <linux/minmax.h> to do the clamping
Thanks. Not a regular contributor to the kernel, not often that I've
got an itch that needs scratching here :).
So something like this then:
345
346 desc.length = clamp(desc.length, PAGE_SIZE, fm->fc->max_pages <<
CONFIG_PAGE_SHIFT);
347 order = get_count_order(desc.length >> CONFIG_PAGE_SHIFT);
348
Note: Can use ctx->count here in clamp directly due to it being signed,
where desc.length is unsigned.
I'm *assuming* get_count_order will round-up, so if max_pages is 7 (if
non-power of two is even possible) we will really get 8 pages here?
Compile tested only. Will perform basic run-time test before re-submit.
>> + order = get_count_order(desc.length >> CONFIG_PAGE_SHIFT);
>> +
>> + do {
>> + folio = folio_alloc(GFP_KERNEL, order);
> Folios can now be larger than one page size for readdir requests with
> your change but I don't believe the current page copying code in fuse
> supports this yet. For example, I think the kmapping will be
> insufficient in fuse_copy_page() where in the current code we kmap
> only the first page in the folio. I sent a patch for supporting large
> folios page copying [1] and am trying to get this merged in but
> haven't heard back about this patchset yet. In your local tests that
> used multiple pages for the readdir request, did you run into any
> issues or it worked fine?
My tests boiled down to running strace as per above, and then some basic
time trials using find /path/to/mount/point with and without the patch
over a fairly large structure containing about 170m inodes. No problems
observed. That said ... I've done similar before, and then introduced a
major memory leak that under load destroyed 100GB of RAM in minutes.
Thus why I'm looking for a few eyeballs on this before going to
production (what we have works, it's just on an older kernel).
If further improvements are possible that would be great, but based on
testing this is already at least a 10x improvement on readdir() performance.
> [1]https://lore.kernel.org/linux-fsdevel/20250123012448.2479372-2-joannelkoong@gmail.com/
Took a quick look, wish I could provide you some feedback but that's
beyond my kernel skill set to just eyeball.
Looks like you're primarily getting rid of the code that references the
pages inside the folio's and just operating on the folio's directly? A
side effect of which (your goal) is to enable larger copies rather than
small ones?
Thank you,
Jaco
Powered by blists - more mailing lists