linux-kernel - Re: [HELP] FUSE writeback performance bottleneck

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <67cdcde3-1095-41cc-9d99-a0b97274d7be@linux.alibaba.com>
Date: Fri, 13 Sep 2024 09:25:13 +0800
From: Jingbo Xu <jefflexu@...ux.alibaba.com>
To: Joanne Koong <joannelkoong@...il.com>
Cc: Miklos Szeredi <miklos@...redi.hu>,
 Bernd Schubert <bernd.schubert@...tmail.fm>,
 "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 lege.wang@...uarmicro.com, "Matthew Wilcox (Oracle)" <willy@...radead.org>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [HELP] FUSE writeback performance bottleneck



On 9/13/24 8:00 AM, Joanne Koong wrote:
> On Thu, Aug 22, 2024 at 8:34 PM Jingbo Xu <jefflexu@...ux.alibaba.com> wrote:
>>
>> On 6/4/24 6:02 PM, Miklos Szeredi wrote:
>>> On Tue, 4 Jun 2024 at 11:32, Bernd Schubert <bernd.schubert@...tmail.fm> wrote:
>>>
>>>> Back to the background for the copy, so it copies pages to avoid
>>>> blocking on memory reclaim. With that allocation it in fact increases
>>>> memory pressure even more. Isn't the right solution to mark those pages
>>>> as not reclaimable and to avoid blocking on it? Which is what the tmp
>>>> pages do, just not in beautiful way.
>>>
>>> Copying to the tmp page is the same as marking the pages as
>>> non-reclaimable and non-syncable.
>>>
>>> Conceptually it would be nice to only copy when there's something
>>> actually waiting for writeback on the page.
>>>
>>> Note: normally the WRITE request would be copied to userspace along
>>> with the contents of the pages very soon after starting writeback.
>>> After this the contents of the page no longer matter, and we can just
>>> clear writeback without doing the copy.
>>
>> OK this really deviates from my previous understanding of the deadlock
>> issue.  Previously I thought *after* the server has received the WRITE
>> request, i.e. has copied the request and page content to userspace, the
>> server needs to allocate some memory to handle the WRITE request, e.g.
>> make the data persistent on disk, or send the data to the remote
>> storage.  It is the memory allocation at this point that actually
>> triggers a memory direct reclaim (on the FUSE dirty page) and causes a
>> deadlock.  It seems that I misunderstand it.
> 
> I think your previous understanding is correct (or if not, then my
> understanding of this is incorrect too lol).
> The first write request makes it to userspace and when the server is
> in the middle of handling it, a memory reclaim is triggered where
> pages need to be written back. This leads to a SECOND write request
> (eg writing back the pages that are reclaimed) but this second write
> request will never be copied out to userspace because the server is
> stuck handling the first write request and waiting for the page
> reclaim bits of the reclaimed pages to be unset, but those reclaim
> bits can only be unset when the pages have been copied out to
> userspace, which only happens when the server reads /dev/fuse for the
> next request.

Right, that's true.

> 
>>
>> If that's true, we can clear PF_writeback as long as the whole request
>> along with the page content has already been copied to userspace, and
>> thus eliminate the tmp page copying.
>>
> 
> I think the problem is that on a single-threaded server,  the pages
> will not be copied out to userspace for the second request (aka
> writing back the dirty reclaimed pages) since the server is stuck on
> the first request.

Agreed.


-- 
Thanks,
Jingbo