[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8fd07d19-b7eb-4ae6-becc-08e6e1502fc8@linux.alibaba.com>
Date: Fri, 5 Jul 2024 20:00:19 +0800
From: Jingbo Xu <jefflexu@...ux.alibaba.com>
To: Bernd Schubert <bernd.schubert@...tmail.fm>, miklos@...redi.hu,
linux-fsdevel@...r.kernel.org
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fuse: make foffset alignment opt-in for optimum backend
performance
Hi Bernd,
Thanks for the comment.
On 7/5/24 7:50 PM, Bernd Schubert wrote:
>
>
> On 7/5/24 12:04, Jingbo Xu wrote:
>> Sometimes the file offset alignment needs to be opt-in to achieve the
>> optimum performance at the backend store.
>>
>> For example when ErasureCode [1] is used at the backend store, the
>> optimum write performance is achieved when the WRITE request is aligned
>> with the stripe size of ErasureCode. Otherwise a non-aligned WRITE
>> request needs to be split at the stripe size boundary. It is quite
>> costly to handle these split partial requests, as firstly the whole
>> stripe to which the split partial request belongs needs to be read out,
>> then overwrite the read stripe buffer with the request, and finally write
>> the whole stripe back to the persistent storage.
>>
>> Thus the backend store can suffer severe performance degradation when
>> WRITE requests can not fit into one stripe exactly. The write performance
>> can be 10x slower when the request is 256KB in size given 4MB stripe size.
>> Also there can be 50% performance degradation in theory if the request
>> is not stripe boundary aligned.
>>
>> Besides, the conveyed test indicates that, the non-alignment issue
>> becomes more severe when decreasing fuse's max_ratio, maybe partly
>> because the background writeback now is more likely to run parallelly
>> with the dirtier.
>>
>> fuse's max_ratio ratio of aligned WRITE requests
>> ---------------- -------------------------------
>> 70 99.9%
>> 40 74%
>> 20 45%
>> 10 20%
>>
>> With the patched version, which makes the alignment constraint opt-in
>> when constructing WRITE requests, the ratio of aligned WRITE requests
>> increases to 98% (previously 20%) when fuse's max_ratio is 10.
>>
>> [1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@linux.alibaba.com/T/#m9bce469998ea6e4f911555c6f7be1e077ce3d8b4
>> Signed-off-by: Jingbo Xu <jefflexu@...ux.alibaba.com>
>>
>> Signed-off-by: Jingbo Xu <jefflexu@...ux.alibaba.com>
>> ---
>> fs/fuse/file.c | 4 ++++
>> fs/fuse/fuse_i.h | 6 ++++++
>> fs/fuse/inode.c | 9 +++++++++
>> include/uapi/linux/fuse.h | 9 ++++++++-
>> 4 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index f39456c65ed7..f9b477016c2e 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -2246,6 +2246,10 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, struct page *page,
>> if (ap->num_pages == data->max_pages && !fuse_pages_realloc(data))
>> return true;
>>
>> + /* Reached alignment */
>> + if (fc->opt_alignment && !(page->index % fc->opt_alignment_pages))
>> + return true;
>
> I link the idea, but couldn't it just do that with
> fc->max_pages? I'm asking because fc->opt_alignment_pages
> takes another uint32_t in fuse_init_out and there is not so much
> space left anymore.
I'm okay with resuing max_pages as the alignment constraint. They are
the same in our internal scenarios. But I'm not sure if it is the case
in other scenarios.
>> /*
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 99e44ea7d875..9266b22cce8e 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -1331,6 +1331,15 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
>> }
>> if (flags & FUSE_NO_EXPORT_SUPPORT)
>> fm->sb->s_export_op = &fuse_export_fid_operations;
>> +
>> + /* fallback to default if opt_alignment <= PAGE_SHIFT */
>> + if (flags & FUSE_OPT_ALIGNMENT) {
>> + if (arg->opt_alignment > PAGE_SHIFT) {
>> + fc->opt_alignment = 1;
>> + fc->opt_alignment_pages = 1 <<
>> + (arg->opt_alignment - PAGE_SHIFT);
>
> opt_alignment is the number of bits required for alignment? Not
> very user friendly, from my point of view would be better to have
> this in a byte or kb unit.
>
Actually I referred to fuse_init_out.map_alignment, which is also
log2(byte alignment). Anyway I'm okay making it a more understandable name.
--
Thanks,
Jingbo
Powered by blists - more mailing lists