linux-kernel - Re: [PATCH] ext4: get discard out of jbd2 commit kthread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1d43599f-fed1-b37e-a411-2b0f31583991@gmail.com>
Date:   Wed, 19 May 2021 09:27:56 +0800
From:   Wang Jianchao <jianchao.wan9@...il.com>
To:     "Theodore Y. Ts'o" <tytso@....edu>
Cc:     Andreas Dilger <adilger.kernel@...ger.ca>,
        linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ext4: get discard out of jbd2 commit kthread



On 2021/5/18 10:57 PM, Theodore Y. Ts'o wrote:
> On Tue, May 18, 2021 at 09:19:13AM +0800, Wang Jianchao wrote:
>>> That way we don't need to move all of this to a kworker context.
>>
>> The submit_bio also needs to be out of jbd2 commit kthread as it may be
>> blocked due to blk-wbt or no enough request tag. ;)
> 
> Actually, there's a bigger deal that I hadn't realized, about why we
> is why are currently using submit_bio_wait().  We *must* wait until
> discard has completed before we call ext4_free_data_in_buddy(), which
> is what allows those blocks to be reused by the block allocator.
> 
> If the discard happens after we reallocate the block, there is a good
> chance that we will end up corrupting a data or metadata block,
> leading to user data loss.

Yes

> 
> There's another corollary to this; if you use blk-wbt, and you are
> doing lots of deletes, and we move this all to a writeback thread,
> this *significantly* increases the chance that the user will see
> ENOSPC errors in the case where they are with a very full (close to
> 100% used) file system.

We would flush the kwork that's doing discard in this patch.
That's done in ext4_should_retry_alloc()

> 
> I'd argue that this is a *really* good reason why using mount -o
> discard is Just A Bad Idea if you are running with blk-wbt.  If
> discards are slow, using fstrim is a much better choice.  It's also
> the case that for most SSD's and workloads, doing frequent discards
> doesn't actually help that much.  The write endurance of the device is
> not compromised that much if you only run fs-trim and discard unused
> blocks once a day, or even once a week --- I only recommend use of
> mount -o discard in cases where the discard operation is effectively
> free.  (e.g., in cases where the FTL is implemented on the Host OS, or
> you are running with super-fast flash which is PCIe or NVMe attached.)

We're running ext4 with discard on a nbd device whose backend is storage
cluster. The discard can help to free the unused space to storage pool.

And sometimes application delete a lot of data and discard is flooding. 
Then we see the jbd2 commit kthread is blocked for a long time. Even
move the discard out of jbd2, we still see the write IO of jbd2 log
could be blocked. blk-wbt could help to relieve this. Finally the delay
is shift to allocation path. But this is better than blocking the page
fault path which holds the read mm->mmap_sem.

Best regards
Jianchao

> 
> Cheers,
> 
> 					- Ted
>