linux-kernel - Re: [PATCH] bcache: enhancing the security of dirty data writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250804153130.igwkb6baf3vtjhzu@P16.>
Date: Mon, 4 Aug 2025 23:31:30 +0800
From: Coly Li <colyli@...nel.org>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: Zhou Jifeng <zhoujifeng@...inos.com.cn>,
	linux-bcache <linux-bcache@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] bcache: enhancing the security of dirty data writeback

On Mon, Aug 04, 2025 at 12:17:28AM -0400, Kent Overstreet wrote:
> On Mon, Aug 04, 2025 at 11:47:57AM +0800, Zhou Jifeng wrote:
> > On Sun, 3 Aug 2025 at 01:30, Coly Li <colyli@...nel.org> wrote:
> > >
> > > On Fri, Aug 01, 2025 at 02:10:12PM +0800, Zhou Jifeng wrote:
> > > > On Fri, 1 Aug 2025 at 11:42, Kent Overstreet <kent.overstreet@...ux.dev> wrote:
> > > > >
> > > > > On Fri, Aug 01, 2025 at 11:30:43AM +0800, Zhou Jifeng wrote:
> > > > > > On Fri, 1 Aug 2025 at 10:37, Kent Overstreet <kent.overstreet@...ux.dev> wrote:
> > > > > > >
> > > > > > > On Fri, Aug 01, 2025 at 10:27:21AM +0800, Zhou Jifeng wrote:
> > > > > > > > In the writeback mode, the current bcache code uses the
> > > > > > > > REQ_OP_WRITE operation to handle dirty data, and clears the bkey
> > > > > > > > dirty flag in the btree during the bio completion callback. I think
> > > > > > > > there might be a potential risk: if in the event of an unexpected
> > > > > > > > power outage, the data in the HDD hardware cache may not have
> > > > > > > > had time to be persisted, then the data in the HDD hardware cache
> > > > > > > > that is pending processing may be lost. Since at this time the bkey
> > > > > > > > dirty flag in the btree has been cleared, the data status recorded
> > > > > > > > by the bkey does not match the actual situation of the SSD and
> > > > > > > > HDD.
> > > > > > > > Am I understanding this correctly?
> > > > > > >
> > > > > > > For what you're describing, we need to make sure the backing device is
> > > > > > > flushed when we're flushing the journal.
> > > > > > >
> > > > > > > It's possible that this isn't handled correctly in bcache; bcachefs
> > > > > > > does, and I wrote that code after bcache - but the bcache version would
> > > > > > > look quite different.
> > > > > > >
> > > > > > > You've read that code more recently than I have - have you checked for
> > > > > > > that?
> > > > > >
> > > > > > In the `write_dirty_finish` function, there is an attempt to update the
> > > > > > `bkey` status, but I did not observe any logging writing process. In the
> > > > > > core function `journal_write_unlocked` of bcache for writing logs, I
> > > > > > also couldn't find the code logic for sending a FLUSH command to the
> > > > > > backend HDD.
> > > > >
> > > > > The right place for it would be in the journal code: before doing a
> > > > > journal write, issue flushes to the backing devices.
> > > > >
> > > > > Can you check for that?
> > > > >
> > > >
> > > > I checked and found that there was no code for sending a flush request
> > > > to the backend device before the execution log was written. Additionally,
> > > > in the callback function after the dirty data was written back, when it
> > > > updated the bkey, it did not insert this update into the log.
> > > >
> > >
> > > It doesn't have to. If the previous dirty version of the key is on cache device
> > > already, and power off happens, even the clean version of the key is gone, the
> > > dirty version and its data are all valid. If the LBA range of this key is
> > > allocated to a new key, a GC must have alrady happened, and the dirty key is
> > > invalid due to bucket generation increased. So don't worry, the clean key is
> > > unncessary to go into journal in the writeback scenario.
> > >
> > > IMHO, you may try to flush backing device in a kworker before calling
> > > set_gc_sectors() in bch_gc_thread(). The disk cache can be flushed in time
> > > before the still-dirty-on-disk keys are invalidated by increase bucket key
> > > gen. And also flushing backing device after searched_full_index becomes
> > > true in the writeback thread main loop (as you did now).
> > >
> > 
> > The "flush" command previously issued by GC was supposed to alleviate
> > the problems in some scenarios. However, I thought of a situation where
> > this "flush" command issued before GC might not be sufficient to solve
> > the issue.
> > 
> > Suppose such a scenario: after a power failure, some hardware cache data
> > in the HDD is lost, while the corresponding bkey(with the dirty flag cleared)
> > update in the SSD has been persisted. After the power is restored, if
> > bcache sends a flush before GC, will this cause data loss?
> 
> Yes.

The cleared key is updated in-place within the in-memory btree node,
flushing backing devices before committing journal doesn't help.

I try to avoid adding the cleared key into journal, in high write pressure,
such synchronized link between writeback, gc and journal makes me really
uncomfortable.

Another choice might be adding a tag in struct btree, and set the tag when
the cleared key in-place updated in the btree node. When writing a bset and
the tag is set, then flush corresponding backing devcie before writing the
btree node. Maybe hurts less performance than flushing backing device before
committing journal set.

How do you think of it, Kent?

Thanks.

Coly Li