linux-kernel - Re: [PATCH] bcache: enhancing the security of dirty data writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_27E675CD6ADE39EA7923AD17@qq.com>
Date: Thu, 7 Aug 2025 10:01:44 +0800
From: "Zhou Jifeng" <zhoujifeng@...inos.com.cn>
To: "Coly Li" <colyli@...nel.org>
Cc: "kent.overstreet" <kent.overstreet@...ux.dev>, "linux-bcache" <linux-bcache@...r.kernel.org>, "linux-kernel" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] bcache: enhancing the security of dirty data writeback

On Thu, 7 Aug 2025 at 00:11, Coly Li <colyli@...nel.org> wrote:
>
> On Wed, Aug 06, 2025 at 07:19:49PM +0800, Zhou Jifeng wrote:
> > On Wed, 6 Aug 2025 at 00:29, Coly Li <colyli@...nel.org> wrote:
> > >
> > > On Tue, Aug 05, 2025 at 05:37:44PM +0800, Zhou Jifeng wrote:
> > > > On Tue, 5 Aug 2025 at 13:00, Coly Li <colyli@...nel.org> wrote:
> > > > >
> > > > > On Thu, Jul 31, 2025 at 02:21:40PM +0800, Zhou Jifeng wrote:
> > > > > > There is a potential data consistency risk in bcache's writeback mode:when
> > > > > > the application calls fsync, bcache returns success after completing the
> > > > > > log write, persisting the cache disk data, and persisting the HDD internal
> > > > > > cache. However, at this point, the actual application data may still be in
> > > > > > a dirty state and remain stuck in the cache disk. when these data are
> > > > > > subsequently written back to the HDD asynchronously through REQ_OP_WRITE,
> > > > > > there is no forced refresh mechanism to ensure physical placement on the
> > > > > > disk, and there may be no power-off protection measures, which poses a risk
> > > > > > of data loss. This mechanism may cause the application to misjudge that the
> > > > > > data has been persisted, which is different from the actual storage state,
> > > > > > and also violates the semantic agreement that fsync should ensure data
> > > > > > persistence.
> > > > > >
> > > > >
> > > > > [snipped]
> > > > >
> > > > >
> > > > >
> > > > > If before the cleared key inserted into the btree, there are new write
> > > > > into overlapped LBA range of the cleared key and a dirty key inserted.
> > > > > Then the cleared key is inserted and overwrites the dirty key, but the
> > > > > dirty data on cache is not written back to backing device yet. How to
> > > > > handle such situation?
> > > > >
> > > >
> > > > There are indeed some issues here. I have initially come up with a
> > > > solution: Utilize the existing dc->writeback_keys mechanism for
> > > > protection. The general processing flow is as follows:
> > > > 1. In the write_dirty_finish() function, remove the operation of
> > > > updating bkey insertion, and delete the code bch_keybuf_del(&dc
> > > > ->writeback_keys, w).
> > > > 2. After executing the read_dirty(dc) code, perform flush, then
> > > > insert the updated bkey, and finally remove the bkey from dc->
> > > > writeback_keys. This process is equivalent to sending a flush
> > > > every KEYBUF_NR bkeys are written back.
> > > > 3. Support configurable KEYBUF_NR to indirectly control the
> > > > frequency of flush.
> > > >
> > > > Is this plan appropriate? Or are there any better ways to handle it?
> > >
> > > No, I won't suggest this way. It sounds complicaed and changes the main
> > > code flow too much in an implicit way, this should be avoided.
> > >
> > > So it seems Kent's suggestion to flush backing device before committing
> > > jset is the proper method I can see now.
> > >
> > > Coly Li
> > >
> >
> > Sorry, my previous response was not rigorous enough. I have carefully
> > considered your question about "the bkey being overwritten". In fact,
> > there is no issue of being overwritten. The bcache has ingeniously
> > designed a replace mechanism. In my code, the bkey with the dirty flag
> > cleared is inserted using the replace method. This method handles
> > address overlaps ingeniously during the insertion of the bkey and will
> > not overwrite the bkey generated by concurrent writes. The main code
> > for the replace mechanism is located in bch_btree_insert_key->bch_extent_insert_fixup
> > , which calls the bch_bkey_equal_header function, which is also a
> > crucial checkpoint.
>
> I am not able to make judgement by the above description, can you post a patch
> then I can see how you insert the key with replace parameter.
>
> Coly Li
>

In the patch I submitted earlier, the bkey for clearing the dirty mark is
implemented in the following code:
+       /*
+        * The dirty data was successfully written back and confirmed to be written
+        * to the disk. The status of the bkey in the btree was updated.
+        */
+       list_for_each_entry_safe(e, tmp, &dc->preflush_keys.list, list) {
+               memset(keys.inline_keys, 0, sizeof(keys.inline_keys));
+               bch_keylist_init(&keys);
+
+               bkey_copy(keys.top, &(e->key.key));
+               SET_KEY_DIRTY(keys.top, false);
+               bch_keylist_push(&keys);
+
+               for (i = 0; i < KEY_PTRS(&(e->key.key)); i++)
+                       atomic_inc(&PTR_BUCKET(dc->disk.c, &(e->key.key), i)->pin);
+
+               ret = bch_btree_insert(dc->disk.c, &keys, NULL, &(e->key.key));
+
+               if (ret)
+                       trace_bcache_writeback_collision(&(e->key.key));
+
+               atomic_long_inc(ret
+                               ? &dc->disk.c->writeback_keys_failed
+                               : &dc->disk.c->writeback_keys_done);
+
+               list_del(&e->list);
+               kfree(e);
+
+               /* For those bkeys that failed to be inserted, you can
+                * ignore them and they will be processed again in the
+                * next write-back scan.
+                */
+       }

Code Explanation:
1. dc->preflush_keys.list stores all bkeys whose dirty flags are to
be cleared.
2. In the list_for_each_entry_safe loop, there are two important
variables: variable 'e' stores the bkeys whose dirty flags are to be
cleared (the original bkeys waiting to be replaced), and variable
'keys' stores the bkeys whose dirty flags have been cleared (the
new bkeys waiting to be inserted).
3. In the bch_btree_insert(dc->disk.c, &keys, NULL, &(e->key.key))
function call, the second parameter '&keys' represents the new bkey
to be inserted, whose dirty flag has been cleared, and the fourth
parameter '&(e->key.key)' represents the original bkey to be replaced.
4. The prototype definition of the bch_btree_insert function is 
"int bch_btree_insert(struct cache_set *c, struct keylist *keys, 
atomic_t *journal_ref, struct bkey *replace_key)". Its fourth parameter
specifies the bkey to be replaced.
5. The core code call stack for replacing the bkey: bch_btree_insert->
bch_btree_map_leaf_nodes->bch_btree_map_nodes_recurse->
btree_insert_fn->bch_btree_insert_node->bch_btree_insert_keys->
btree_insert_key->bch_btree_insert_key->bch_extent_insert_fixup
6. The prototype of the bch_extent_insert_fixup function is defined as
"static bool bch_extent_insert_fixup(struct btree_keys *b, struct bkey 
*insert, struct btree_iter *iter, struct bkey *replace_key)". The function's
internal implementation checks whether the replace_key parameter
exists. If so, it initiates the replacement process. This replacement
process ensures that the bkey is not error overwritten when a write
request and dirty data are written back concurrently.

Zhou Jifeng