lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJnrk1bKkzqVv1HkRFVTWJOp-w=HMb6tvY5r=9RHtKj39EBRRQ@mail.gmail.com>
Date: Mon, 31 Mar 2025 16:01:30 -0700
From: Joanne Koong <joannelkoong@...il.com>
To: Jaco Kroon <jaco@....co.za>
Cc: bernd.schubert@...tmail.fm, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, miklos@...redi.hu, rdunlap@...radead.org, 
	trapexit@...wn.link
Subject: Re: [PATCH 2/2] fuse: Adjust readdir() buffer to requesting buffer size.

On Mon, Mar 31, 2025 at 2:48 PM Joanne Koong <joannelkoong@...il.com> wrote:
>
> On Mon, Mar 31, 2025 at 1:43 PM Jaco Kroon <jaco@....co.za> wrote:
> >
> > Hi,
> >
> > On 2025/03/31 18:41, Joanne Koong wrote:
> > > On Fri, Mar 14, 2025 at 3:39 PM Jaco Kroon<jaco@....co.za> wrote:
> > >> Clamp to min 1 page (4KB) and max 128 pages (512KB).
> > >>
> > >> Glusterfs trial using strace ls -l.
> > >>
> > >> Before:
> > >>
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 616
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 624
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 624
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 600
> > >> getdents64(3, 0x7f2d7d7a7040 /* 25 entries */, 131072) = 608
> > >> getdents64(3, 0x7f2d7d7a7040 /* 1 entries */, 131072) = 24
> > >> getdents64(3, 0x7f2d7d7a7040 /* 0 entries */, 131072) = 0
> > >>
> > >> After:
> > >>
> > >> getdents64(3, 0x7ffae8eed040 /* 276 entries */, 131072) = 6696
> > >> getdents64(3, 0x7ffae8eed040 /* 0 entries */, 131072) = 0
> > >>
> > >> Signed-off-by: Jaco Kroon<jaco@....co.za>
> > >> ---
> > >>   fs/fuse/readdir.c | 22 ++++++++++++++++++----
> > >>   1 file changed, 18 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
> > >> index 17ce9636a2b1..a0ccbc84b000 100644
> > >> --- a/fs/fuse/readdir.c
> > >> +++ b/fs/fuse/readdir.c
> > >> @@ -337,11 +337,25 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
> > >>          struct fuse_mount *fm = get_fuse_mount(inode);
> > >>          struct fuse_io_args ia = {};
> > >>          struct fuse_args_pages *ap = &ia.ap;
> > >> -       struct fuse_folio_desc desc = { .length = PAGE_SIZE };
> > >> +       struct fuse_folio_desc desc = { .length = ctx->count };
> > >>          u64 attr_version = 0, evict_ctr = 0;
> > >>          bool locked;
> > >> +       int order;
> > >>
> > >> -       folio = folio_alloc(GFP_KERNEL, 0);
> > >> +       if (desc.length < PAGE_SIZE)
> > >> +               desc.length = PAGE_SIZE;
> > >> +       else if (desc.length > (PAGE_SIZE << 7)) /* 128 pages, typically 512KB */
> > >> +               desc.length = PAGE_SIZE << 7;
> > >> +
> > > Just wondering, how did 128 pages get decided as the upper bound? It
> > > seems to me to make more sense if the upper bound is fc->max_pages.
> >
> > Best answer ... random/guess at something which may be sensible.
> >
> > > Also btw, I think you can just use the clamp() helper from
> > > <linux/minmax.h> to do the clamping
> >
> > Thanks.  Not a regular contributor to the kernel, not often that I've
> > got an itch that needs scratching here :).
> >
> > So something like this then:
> >
> > 345
> > 346     desc.length = clamp(desc.length, PAGE_SIZE, fm->fc->max_pages <<
> > CONFIG_PAGE_SHIFT);
> > 347     order = get_count_order(desc.length >> CONFIG_PAGE_SHIFT);
> > 348
> >
>
> You can just use PAGE_SHIFT here instead of CONFIG_PAGE_SHIFT
>
> > Note:  Can use ctx->count here in clamp directly due to it being signed,
> > where desc.length is unsigned.
> >
> > I'm *assuming* get_count_order will round-up, so if max_pages is 7 (if
> > non-power of two is even possible) we will really get 8 pages here?
>
> Yes, if you have a max_pages of 7, this will round up and return to
> you an order of 3
>
> >
> > Compile tested only.  Will perform basic run-time test before re-submit.
> >
> > >> +       order = get_count_order(desc.length >> CONFIG_PAGE_SHIFT);
> > >> +
> > >> +       do {
> > >> +               folio = folio_alloc(GFP_KERNEL, order);
> > > Folios can now be larger than one page size for readdir requests with
> > > your change but I don't believe the current page copying code in fuse
> > > supports this yet. For example, I think the kmapping will be
> > > insufficient in fuse_copy_page() where in the current code we kmap
> > > only the first page in the folio. I sent a patch for supporting large
> > > folios page copying [1] and am trying to get this merged in but
> > > haven't heard back about this patchset yet. In your local tests that
> > > used multiple pages for the readdir request, did you run into any
> > > issues or it worked fine?
> >
> > My tests boiled down to running strace as per above, and then some basic
> > time trials using find /path/to/mount/point with and without the patch
> > over a fairly large structure containing about 170m inodes.  No problems
> > observed.  That said ... I've done similar before, and then introduced a
> > major memory leak that under load destroyed 100GB of RAM in minutes.
> > Thus why I'm looking for a few eyeballs on this before going to
> > production (what we have works, it's just on an older kernel).
> >
> > If further improvements are possible that would be great, but based on
> > testing this is already at least a 10x improvement on readdir() performance.
> >
>
> I think you need the patch I linked to or this could cause crashes.
> The patch adds support for copying folios larger than 1 page size in
> fuse. Maybe you're not running into the crash because it's going
> through splice which will circumvent copying, but in the non-splice
> case, I believe the kmap is insufficient when you go to do the actual
> copying. IOW, I think that patch is a dependency for this one.
>

Update: I looked more into this and I was wrong, you don't need that
patch as a dependency. In fuse_copy_do(), the number of bytes copied
is still done page by page regardless of how large the folio is (eg
see ncpy = min(*size, cs->len)). So please ignore my earlier comment
about this potentially accessing memory that hasn't been kmapped
properly.


Thanks,
Joanne

> Thanks,
> Joanne
>
> > > [1]https://lore.kernel.org/linux-fsdevel/20250123012448.2479372-2-joannelkoong@gmail.com/
> > Took a quick look, wish I could provide you some feedback but that's
> > beyond my kernel skill set to just eyeball.
> >
> > Looks like you're primarily getting rid of the code that references the
> > pages inside the folio's and just operating on the folio's directly? A
> > side effect of which (your goal) is to enable larger copies rather than
> > small ones?
> >
> > Thank you,
> > Jaco

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ