[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d1955ffb-77ce-97a6-fcf2-b25960d389aa@fastmail.fm>
Date: Fri, 22 Apr 2022 17:46:14 +0200
From: Bernd Schubert <bernd.schubert@...tmail.fm>
To: Miklos Szeredi <miklos@...redi.hu>,
Bernd Schubert <bschubert@....com>
Cc: Linux-FSDevel <linux-fsdevel@...r.kernel.org>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Dharmendra Singh <dsingh@....com>
Subject: Re: RFC fuse waitq latency
[I removed the failing netapp/zufs CCs]
On 4/22/22 14:25, Miklos Szeredi wrote:
> On Mon, 28 Mar 2022 at 15:21, Bernd Schubert <bschubert@....com> wrote:
>>
>> I would like to discuss the user thread wake up latency in
>> fuse_dev_do_read(). Profiling fuse shows there is room for improvement
>> regarding memory copies and splice. The basic profiling with flame graphs
>> didn't reveal, though, why fuse is so much
>> slower (with an overlay file system) than just accessing the underlying
>> file system directly and also didn't reveal why a single threaded fuse
>> uses less than 100% cpu, with the application on top of use also using
>> less than 100% cpu (simple bonnie++ runs with 1B files).
>> So I started to suspect the wait queues and indeed, keeping the thread
>> that reads the fuse device for work running for some time gives quite
>> some improvements.
>
> Might be related: I experimented with wake_up_sync() that didn't meet
> my expectations. See this thread:
>
> https://lore.kernel.org/all/1638780405-38026-1-git-send-email-quic_pragalla@quicinc.com/#r
>
> Possibly fuse needs some wake up tweaks due to its special scheduling
> requirements.
Thanks I will look at that as well. I have a patch with spinning and
avoid of thread wake that is almost complete and in my (still limited)
testing almost does not take more CPU and improves meta data / bonnie
performance in between factor ~1.9 and 3, depending on in which
performance mode the cpu is.
https://github.com/aakefbs/linux/commits/v5.17-fuse-scheduling3
Missing is just another option for wake-queue-size trigger and handling
of signals. Should be ready once I'm done with my other work.
That being said, in the mean time I do believe a better approach would
be SQ/CQ like, similar to NVME or io_uring. In principle exactly as
io_uring, just the other way around - kernel fills in SQ, user space
consumes it and fills CQ. We also looked into zufs and your fuse2 branch
and were almost ready to start to port it to a recent kernel, but it is
still all systemcall based and has waitq's - probably much slower than
what could be achieved through queue pairs. Assuming userspace would not
want a polling thread, but would want a notification similar to
io_uring_enter(), there would be still a thread needed to be woken up,
may that is where wake_up_sync() would help.
Btw, the optional kernel polling thread in io_uring also has spinning...
Bernd
Powered by blists - more mailing lists