lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ffeb7915-a028-40d8-94d0-4c647ee8e184@uls.co.za>
Date: Mon, 31 Mar 2025 22:43:42 +0200
From: Jaco Kroon <jaco@....co.za>
To: Joanne Koong <joannelkoong@...il.com>
Cc: bernd.schubert@...tmail.fm, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, miklos@...redi.hu, rdunlap@...radead.org,
 trapexit@...wn.link
Subject: Re: [PATCH 2/2] fuse: Adjust readdir() buffer to requesting buffer
 size.

Hi,

On 2025/03/31 18:41, Joanne Koong wrote:
> On Fri, Mar 14, 2025 at 3:39 PM Jaco Kroon<jaco@....co.za> wrote:
>> Clamp to min 1 page (4KB) and max 128 pages (512KB).
>>
>> Glusterfs trial using strace ls -l.
>>
>> Before:
>>
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 616
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 624
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 624
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
>> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 608
>> getdents64(3, 0x7f2d7d7a7040 /* 1 entries */, 131072) = 24
>> getdents64(3, 0x7f2d7d7a7040 /* 0 entries */, 131072) = 0
>>
>> After:
>>
>> getdents64(3, 0x7ffae8eed040 /* 276 entries */, 131072) = 6696
>> getdents64(3, 0x7ffae8eed040 /* 0 entries */, 131072) = 0
>>
>> Signed-off-by: Jaco Kroon<jaco@....co.za>
>> ---
>>   fs/fuse/readdir.c | 22 ++++++++++++++++++----
>>   1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
>> index 17ce9636a2b1..a0ccbc84b000 100644
>> --- a/fs/fuse/readdir.c
>> +++ b/fs/fuse/readdir.c
>> @@ -337,11 +337,25 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
>>          struct fuse_mount *fm = get_fuse_mount(inode);
>>          struct fuse_io_args ia = {};
>>          struct fuse_args_pages *ap = &ia.ap;
>> -       struct fuse_folio_desc desc = { .length = PAGE_SIZE };
>> +       struct fuse_folio_desc desc = { .length = ctx->count };
>>          u64 attr_version = 0, evict_ctr = 0;
>>          bool locked;
>> +       int order;
>>
>> -       folio = folio_alloc(GFP_KERNEL, 0);
>> +       if (desc.length < PAGE_SIZE)
>> +               desc.length = PAGE_SIZE;
>> +       else if (desc.length > (PAGE_SIZE << 7)) /* 128 pages, typically 512KB */
>> +               desc.length = PAGE_SIZE << 7;
>> +
> Just wondering, how did 128 pages get decided as the upper bound? It
> seems to me to make more sense if the upper bound is fc->max_pages.

Best answer ... random/guess at something which may be sensible.

> Also btw, I think you can just use the clamp() helper from
> <linux/minmax.h> to do the clamping

Thanks.  Not a regular contributor to the kernel, not often that I've 
got an itch that needs scratching here :).

So something like this then:

345
346     desc.length = clamp(desc.length, PAGE_SIZE, fm->fc->max_pages << 
CONFIG_PAGE_SHIFT);
347     order = get_count_order(desc.length >> CONFIG_PAGE_SHIFT);
348

Note:  Can use ctx->count here in clamp directly due to it being signed, 
where desc.length is unsigned.

I'm *assuming* get_count_order will round-up, so if max_pages is 7 (if 
non-power of two is even possible) we will really get 8 pages here?

Compile tested only.  Will perform basic run-time test before re-submit.

>> +       order = get_count_order(desc.length >> CONFIG_PAGE_SHIFT);
>> +
>> +       do {
>> +               folio = folio_alloc(GFP_KERNEL, order);
> Folios can now be larger than one page size for readdir requests with
> your change but I don't believe the current page copying code in fuse
> supports this yet. For example, I think the kmapping will be
> insufficient in fuse_copy_page() where in the current code we kmap
> only the first page in the folio. I sent a patch for supporting large
> folios page copying [1] and am trying to get this merged in but
> haven't heard back about this patchset yet. In your local tests that
> used multiple pages for the readdir request, did you run into any
> issues or it worked fine?

My tests boiled down to running strace as per above, and then some basic 
time trials using find /path/to/mount/point with and without the patch 
over a fairly large structure containing about 170m inodes.  No problems 
observed.  That said ... I've done similar before, and then introduced a 
major memory leak that under load destroyed 100GB of RAM in minutes.  
Thus why I'm looking for a few eyeballs on this before going to 
production (what we have works, it's just on an older kernel).

If further improvements are possible that would be great, but based on 
testing this is already at least a 10x improvement on readdir() performance.

> [1]https://lore.kernel.org/linux-fsdevel/20250123012448.2479372-2-joannelkoong@gmail.com/
Took a quick look, wish I could provide you some feedback but that's 
beyond my kernel skill set to just eyeball.

Looks like you're primarily getting rid of the code that references the 
pages inside the folio's and just operating on the folio's directly? A 
side effect of which (your goal) is to enable larger copies rather than 
small ones?

Thank you,
Jaco

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ