linux-kernel - Re: [PATCH] bcache: add the deferred_flush IO processing path in the writeback mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DEAB3741-AF2F-4CD3-B715-EBC3AB9B394E@coly.li>
Date: Sun, 27 Apr 2025 14:47:59 +0800
From: Coly Li <i@...y.li>
To: Zhou Jifeng <zhoujifeng@...inos.com.cn>
Cc: Coly Li <colyli@...nel.org>,
 "kent.overstreet" <kent.overstreet@...ux.dev>,
 linux-bcache <linux-bcache@...r.kernel.org>,
 linux-kernel <linux-kernel@...r.kernel.org>,
 夏华 <xiahua@...inos.com.cn>,
 邓旺波 <dengwangbo@...inos.com.cn>
Subject: Re: [PATCH] bcache: add the deferred_flush IO processing path in the
 writeback mode



> 2025年4月25日 16:18，Zhou Jifeng <zhoujifeng@...inos.com.cn> 写道：
> 
> Hi Coly Li,
> Thank you for your reply and your question.
> 
> On Fri, 25 Apr 2025 at 13:46, Coly Li <colyli@...nel.org> wrote:
>> 
>> Hi Jifeng,
>> 
>> Thanks for posting the patch.
>> 
>> On Fri, Apr 25, 2025 at 11:50:21AM +0800, Zhou Jifeng wrote:
>>> In some scenarios with high requirements for both data reliability and
>>> write performance, the various cache modes of the current bcache cannot
>> 
>> Could you provide the detail workload or circumstance which requires both
>> data reliability and write performance that current bcache cannot serve?
> 
> For example, in some database application scenarios, the requirements for data
> security are relatively high. When writing frequently, flush is called more often,
> and the performance of write dsync is of great concern. The operational performance
> of several cache modes of bcache in such scenarios at present:
> none: The cache does not work and is of no help to performance. The performance is
> the same as that of the backing device and cannot meet the performance requirements.
> writeround and writethrough: They are not helpful for write performance. The write
> performance is the same as that of the backing device and cannot meet the write
> performance requirements.
> writeback: Since when it writes back dirty data, it only marks bio as REQ_OP_WRITE,
> there is a risk of data loss due to power failure. In addition, since it needs to send a
> flush request to the backing device when handling requests with the flush mark, it will
> affect the write performance.
> 
>>> fully match the requirements. deferred_flush aims to increase the
>>> reliability of writeback write-back. And reduce the sending of PREFLUSH
>>> requests to the backing device to enhance data security and dsync write
>>> performance in wrieback mode.
>> 
>> I'd like to see the detailed description on how deferred flush is defined,
>> and how it works. And why deferred flush may provide the data reliability
>> and performance better than current bcache code.
> 
> deferred flush: When data is processed through the writeback path, it will determine
> whether a PREFLUSH needs to be sent to the backing device. The judgment criterion
> is whether a write request has been sent through bypass or writethrough before. If not,
> it is not necessary. Put the PREFLUSH semantics into the dirty data write-back stage
> to ensure the reliability of the dirty data write-back. Here, by reducing the sending of
> PRELUSH to the backing device, the delay for the backing device to process PRELUSH
> is decreased, thereby improving the performance of dsync write requests when the
> cache space is abundant. During the dirty data write-back stage, the FUA method is
> adopted to ensure that the dirty data will not be lost due to factors such as power failure.
> 
>> I don't look into the patch yet, just with my intuition the overall
>> performance won't be quite optimized by setting FUA on writeback I/Os.
> 
> Using the FUA method to write back dirty data does indeed have an impact on the speed
> of writing back dirty data. In a test where I/O is written randomly at 4K, the speed of
> writing back dirty data is approximately half that of the non-FUA method. However,
> considering that the data is not written at a high intensity continuously, this provides some
> buffer time for writing back dirty data. In extreme cases, when the effective space of the
> cache is tight, its write efficiency is not lower than the performance of the backing device.
> Therefore, enabling deferred_flush is effective in low-cost deployment solutions that require
> the use of SSD to accelerate the performance of dsync.

I am not sure whether the situation you stated is acceptable or not for most of users.

I hope to see more testing data.

> 
>> And the cache mode can swtich arbitarily in run time, if cache mode was none
>> or writethough, then switch to writeback, I don't see your patch handles
>> such situation.
> 
> When switching from other cache modes to writeback and simultaneously enabling 
> deferred_flush, a REQ_PREFLUSH request will be sent to the backing device.
> Code location in the patch:
> +   if (attr == &sysfs_deferred_flush) {
> +       v = __sysfs_match_string(bch_deferred_flush, -1, buf);
> +       if (v < 0)
> +           return v;
> +
> +       if ((unsigned int) v != BDEV_DEFERRED_FLUSH(&dc->sb)) {
> +           if (v && (BDEV_CACHE_MODE(&dc->sb) != CACHE_MODE_WRITEBACK)) {
> +               pr_err("It's not the writeback mode that can't enable deferred_flush.\n");
> +               return -EINVAL;
> +           }
> +
> +           SET_BDEV_DEFERRED_FLUSH(&dc->sb, v);
> +           bch_write_bdev_super(dc, NULL);
> +           if (v) {
> +               bio_init(&flush, dc->bdev, NULL, 0, REQ_OP_WRITE | REQ_PREFLUSH);
> +               /* I/O request sent to backing device */
> +               submit_bio_wait(&flush);
> +           }
> +       }
> +   }

And when read/write congestion happen, part of read/write requests will directly from/to hard drive and bypass cache device.

Anyway, long time high load testing result is necessary. I assume this patch will decrease general write back throughput and result higher pressure for cache device garbage collection load triggered by allocator.
Maybe I am wrong, I’d like to learn from your benchmark results.

Thanks.

Coly Li