[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15fad0eb-b161-b87d-9964-e77a7193de48@fastmail.fm>
Date: Thu, 27 Jul 2023 21:16:53 +0200
From: Bernd Schubert <bernd.schubert@...tmail.fm>
To: Miklos Szeredi <miklos@...redi.hu>, Jaco Kroon <jaco@....co.za>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Randy Dunlap <rdunlap@...radead.org>,
Antonio SJ Musumeci <trapexit@...wn.link>
Subject: Re: [PATCH] fuse: enable larger read buffers for readdir [v2].
On 7/27/23 17:35, Miklos Szeredi wrote:
> On Thu, 27 Jul 2023 at 10:13, Jaco Kroon <jaco@....co.za> wrote:
>>
>> This patch does not mess with the caching infrastructure like the
>> previous one, which we believe caused excessive CPU and broke directory
>> listings in some cases.
>>
>> This version only affects the uncached read, which then during parse adds an
>> entry at a time to the cached structures by way of copying, and as such,
>> we believe this should be sufficient.
>>
>> We're still seeing cases where getdents64 takes ~10s (this was the case
>> in any case without this patch, the difference now that we get ~500
>> entries for that time rather than the 14-18 previously). We believe
>> that that latency is introduced on glusterfs side and is under separate
>> discussion with the glusterfs developers.
>>
>> This is still a compile-time option, but a working one compared to
>> previous patch. For now this works, but it's not recommended for merge
>> (as per email discussion).
>>
>> This still uses alloc_pages rather than kvmalloc/kvfree.
>>
>> Signed-off-by: Jaco Kroon <jaco@....co.za>
>> ---
>> fs/fuse/Kconfig | 16 ++++++++++++++++
>> fs/fuse/readdir.c | 18 ++++++++++++------
>> 2 files changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
>> index 038ed0b9aaa5..0783f9ee5cd3 100644
>> --- a/fs/fuse/Kconfig
>> +++ b/fs/fuse/Kconfig
>> @@ -18,6 +18,22 @@ config FUSE_FS
>> If you want to develop a userspace FS, or if you want to use
>> a filesystem based on FUSE, answer Y or M.
>>
>> +config FUSE_READDIR_ORDER
>> + int
>> + range 0 5
>> + default 5
>> + help
>> + readdir performance varies greatly depending on the size of the read.
>> + Larger buffers results in larger reads, thus fewer reads and higher
>> + performance in return.
>> +
>> + You may want to reduce this value on seriously constrained memory
>> + systems where 128KiB (assuming 4KiB pages) cache pages is not ideal.
>> +
>> + This value reprents the order of the number of pages to allocate (ie,
>> + the shift value). A value of 0 is thus 1 page (4KiB) where 5 is 32
>> + pages (128KiB).
>> +
>> config CUSE
>> tristate "Character device in Userspace support"
>> depends on FUSE_FS
>> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
>> index dc603479b30e..47cea4d91228 100644
>> --- a/fs/fuse/readdir.c
>> +++ b/fs/fuse/readdir.c
>> @@ -13,6 +13,12 @@
>> #include <linux/pagemap.h>
>> #include <linux/highmem.h>
>>
>> +#define READDIR_PAGES_ORDER CONFIG_FUSE_READDIR_ORDER
>> +#define READDIR_PAGES (1 << READDIR_PAGES_ORDER)
>> +#define READDIR_PAGES_SIZE (PAGE_SIZE << READDIR_PAGES_ORDER)
>> +#define READDIR_PAGES_MASK (READDIR_PAGES_SIZE - 1)
>> +#define READDIR_PAGES_SHIFT (PAGE_SHIFT + READDIR_PAGES_ORDER)
>> +
>> static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
>> {
>> struct fuse_conn *fc = get_fuse_conn(dir);
>> @@ -328,25 +334,25 @@ static int fuse_readdir_uncached(struct file *file, struct dir_context *ctx)
>> struct fuse_mount *fm = get_fuse_mount(inode);
>> struct fuse_io_args ia = {};
>> struct fuse_args_pages *ap = &ia.ap;
>> - struct fuse_page_desc desc = { .length = PAGE_SIZE };
>> + struct fuse_page_desc desc = { .length = READDIR_PAGES_SIZE };
>
> Does this really work? I would've thought we are relying on single
> page lengths somewhere.
>
>> u64 attr_version = 0;
>> bool locked;
>>
>> - page = alloc_page(GFP_KERNEL);
>> + page = alloc_pages(GFP_KERNEL, READDIR_PAGES_ORDER);
>> if (!page)
>> return -ENOMEM;
>>
>> plus = fuse_use_readdirplus(inode, ctx);
>> ap->args.out_pages = true;
>> - ap->num_pages = 1;
>> + ap->num_pages = READDIR_PAGES;
>
> No. This is the array lenght, which is 1. This is the hack I guess,
> which makes the above trick work.
>
> Better use kvmalloc, which might have a slightly worse performance
> than a large page, but definitely not worse than the current single
> page.
>
> If we want to optimize the overhead of kvmalloc (and it's a big if)
> then the parse_dir*file() functions would need to be converted to
> using a page array instead of a plain kernel pointer, which would add
> some complexity for sure.
One simple possibility might be to do pos=0 with a small buffer size
single page and only if pos is set we switch to a larger buffer - that
way small directories don't get the overhead of the large allocation.
Although following your idea to to the getdents buffer size - this is
something libc could already start with.
Cheers,
Bernd
Powered by blists - more mailing lists