lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27875beb-bd1c-0087-ac4c-420a9d92a5a9@uls.co.za>
Date:   Wed, 26 Jul 2023 20:25:48 +0200
From:   Jaco Kroon <jaco@....co.za>
To:     Antonio SJ Musumeci <trapexit@...wn.link>,
        Bernd Schubert <bernd.schubert@...tmail.fm>,
        Miklos Szeredi <miklos@...redi.hu>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fuse: enable larger read buffers for readdir.

Hi,

On 2023/07/26 19:23, Antonio SJ Musumeci wrote:
> On 7/26/23 10:45, Bernd Schubert wrote:
>> On 7/26/23 17:26, Jaco Kroon wrote:
>>> Hi,
>>>
>>> On 2023/07/26 15:53, Bernd Schubert wrote:
>>>> On 7/26/23 12:59, Jaco Kroon wrote:
>>>>> Signed-off-by: Jaco Kroon <jaco@....co.za>
>>>>> ---
>>>>>     fs/fuse/Kconfig   | 16 ++++++++++++++++
>>>>>     fs/fuse/readdir.c | 42 ++++++++++++++++++++++++------------------
>>>>>     2 files changed, 40 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
>>>>> index 038ed0b9aaa5..0783f9ee5cd3 100644
>>>>> --- a/fs/fuse/Kconfig
>>>>> +++ b/fs/fuse/Kconfig
>>>>> @@ -18,6 +18,22 @@ config FUSE_FS
>>>>>           If you want to develop a userspace FS, or if you want to use
>>>>>           a filesystem based on FUSE, answer Y or M.
>>>>>     +config FUSE_READDIR_ORDER
>>>>> +    int
>>>>> +    range 0 5
>>>>> +    default 5
>>>>> +    help
>>>>> +        readdir performance varies greatly depending on the size of
>>>>> the read.
>>>>> +        Larger buffers results in larger reads, thus fewer reads and
>>>>> higher
>>>>> +        performance in return.
>>>>> +
>>>>> +        You may want to reduce this value on seriously constrained
>>>>> memory
>>>>> +        systems where 128KiB (assuming 4KiB pages) cache pages is
>>>>> not ideal.
>>>>> +
>>>>> +        This value reprents the order of the number of pages to
>>>>> allocate (ie,
>>>>> +        the shift value).  A value of 0 is thus 1 page (4KiB) where
>>>>> 5 is 32
>>>>> +        pages (128KiB).
>>>>> +
>>>> I like the idea of a larger readdir size, but shouldn't that be a
>>>> server/daemon/library decision which size to use, instead of kernel
>>>> compile time? So should be part of FUSE_INIT negotiation?
>>> Yes sure, but there still needs to be a default.  And one page at a time
>>> doesn't cut it.
>> With FUSE_INIT userspace would make that decision, based on what kernel
>> fuse suggests? process_init_reply() already handles other limits - I
>> don't see why readdir max has to be compile time option. Maybe a module
>> option to set the limit?
>>
>> Thanks,
>> Bernd
> I had similar question / comment. This seems to me to be more
> appropriately handed by the server via FUSE_INIT.
>
> And wouldn't "max" more easily be FUSE_MAX_MAX_PAGES? Is there a reason
> not to allow upwards of 256 pages sized readdir buffer?

Will look into FUSE_INIT.  The FUSE_INIT as I understand from what I've 
read has some expansion constraints or the structure is somehow 
negotiated.  Older clients in other words that's not aware of the option 
will follow some default.  For backwards compatibility that default 
should probably be 1 page.  For performance reasons it makes sense that 
this limit be larger.

glibc uses a 128KiB buffer for getdents64, so I'm not sure >128KiB here 
makes sense.  Or if these two buffers are even directly related.

Default to fc->max_pages (which defaults to 32 or 128KiB) if the 
user-space side doesn't understand the max_readdir_pages limit aspect of 
things?  Assuming these limits should be set separately.  I'm thinking 
piggy backing on fc->max_pages is just fine to be honest.

For the sake of avoiding division and modulo operations in the cache, 
I'm thinking round-down max_pages to the closest power of two for the 
sake of sticking to binary operators rather than divisions and mods?

Current patch introduces a definite memory leak either way.  Tore 
through about 12GB of RAM in a matter of 20 minutes or so.  Just going 
to patch it that way first, and then based on responses above will look 
into an alternative patch that does not depend on a compile-time 
option.  Guessing __free_page should be a multi-page variant now.

Thanks for all the feedback so far.

Kind regards,
Jaco

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ