linux-kernel - Re: [PATCH] bcache: enhancing the security of dirty data writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <p4uhjrka2rdj67ph5puvaixxhstcyfitzq63pwrafdwtabtjwn@fbie2x77lqee>
Date: Sun, 3 Aug 2025 01:29:55 +0800
From: Coly Li <colyli@...nel.org>
To: Zhou Jifeng <zhoujifeng@...inos.com.cn>
Cc: Kent Overstreet <kent.overstreet@...ux.dev>, 
	linux-bcache <linux-bcache@...r.kernel.org>, linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] bcache: enhancing the security of dirty data writeback

On Fri, Aug 01, 2025 at 02:10:12PM +0800, Zhou Jifeng wrote:
> On Fri, 1 Aug 2025 at 11:42, Kent Overstreet <kent.overstreet@...ux.dev> wrote:
> >
> > On Fri, Aug 01, 2025 at 11:30:43AM +0800, Zhou Jifeng wrote:
> > > On Fri, 1 Aug 2025 at 10:37, Kent Overstreet <kent.overstreet@...ux.dev> wrote:
> > > >
> > > > On Fri, Aug 01, 2025 at 10:27:21AM +0800, Zhou Jifeng wrote:
> > > > > In the writeback mode, the current bcache code uses the
> > > > > REQ_OP_WRITE operation to handle dirty data, and clears the bkey
> > > > > dirty flag in the btree during the bio completion callback. I think
> > > > > there might be a potential risk: if in the event of an unexpected
> > > > > power outage, the data in the HDD hardware cache may not have
> > > > > had time to be persisted, then the data in the HDD hardware cache
> > > > > that is pending processing may be lost. Since at this time the bkey
> > > > > dirty flag in the btree has been cleared, the data status recorded
> > > > > by the bkey does not match the actual situation of the SSD and
> > > > > HDD.
> > > > > Am I understanding this correctly?
> > > >
> > > > For what you're describing, we need to make sure the backing device is
> > > > flushed when we're flushing the journal.
> > > >
> > > > It's possible that this isn't handled correctly in bcache; bcachefs
> > > > does, and I wrote that code after bcache - but the bcache version would
> > > > look quite different.
> > > >
> > > > You've read that code more recently than I have - have you checked for
> > > > that?
> > >
> > > In the `write_dirty_finish` function, there is an attempt to update the
> > > `bkey` status, but I did not observe any logging writing process. In the
> > > core function `journal_write_unlocked` of bcache for writing logs, I
> > > also couldn't find the code logic for sending a FLUSH command to the
> > > backend HDD.
> >
> > The right place for it would be in the journal code: before doing a
> > journal write, issue flushes to the backing devices.
> >
> > Can you check for that?
> >
> 
> I checked and found that there was no code for sending a flush request
> to the backend device before the execution log was written. Additionally,
> in the callback function after the dirty data was written back, when it
> updated the bkey, it did not insert this update into the log.
>

It doesn't have to. If the previous dirty version of the key is on cache device
already, and power off happens, even the clean version of the key is gone, the
dirty version and its data are all valid. If the LBA range of this key is
allocated to a new key, a GC must have alrady happened, and the dirty key is
invalid due to bucket generation increased. So don't worry, the clean key is
unncessary to go into journal in the writeback scenario.
 
IMHO, you may try to flush backing device in a kworker before calling
set_gc_sectors() in bch_gc_thread(). The disk cache can be flushed in time
before the still-dirty-on-disk keys are invalidated by increase bucket key
gen. And also flushing backing device after searched_full_index becomes
true in the writeback thread main loop (as you did now).

Flushing backing device after read_dirty() returns is too heavy, event with
the flush hint keys. And the flushing operations are unnecessary before the
keys are reclaimed by garbage collaction.

Just my suggestion for your consideration.

Thanks.

Coly Li


> The following callback function after dirty data is written back only
> updates the bkey, without recording the inserted bkey in the journal:
> static CLOSURE_CALLBACK(write_dirty_finish)
> {
>     closure_type(io, struct dirty_io, cl);
>     struct keybuf_key *w = io->bio.bi_private;
>     struct cached_dev *dc = io->dc;
> 
>     bio_free_pages(&io->bio);
> 
>     /* This is kind of a dumb way of signalling errors. */
>     if (KEY_DIRTY(&w->key)) {
>         int ret;
>         unsigned int i;
>         struct keylist keys;
> 
>         bch_keylist_init(&keys);
> 
>         bkey_copy(keys.top, &w->key);
>         SET_KEY_DIRTY(keys.top, false);
>         bch_keylist_push(&keys);
> 
>         for (i = 0; i < KEY_PTRS(&w->key); i++)
>             atomic_inc(&PTR_BUCKET(dc->disk.c, &w->key, i)->pin);
> 
>         ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
> 
>         if (ret)
>             trace_bcache_writeback_collision(&w->key);
> 
>         atomic_long_inc(ret
>                 ? &dc->disk.c->writeback_keys_failed
>                 : &dc->disk.c->writeback_keys_done);
>     }
> 
>     bch_keybuf_del(&dc->writeback_keys, w);
>     up(&dc->in_flight);
> 
>     closure_return_with_destructor(cl, dirty_io_destructor);
> }