[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<TYZPR02MB78424F31FF023102693D2DB2A65A2@TYZPR02MB7842.apcprd02.prod.outlook.com>
Date: Wed, 13 Nov 2024 08:58:10 +0000
From: "mingzhe.zou@...ystack.cn" <mingzhe.zou@...ystack.cn>
To: liequan che <liequanche@...il.com>, Coly Li <colyli@...e.de>, Kent
Overstreet <kent.overstreet@...il.com>, linux-bcache
<linux-bcache@...r.kernel.org>, linux-kernel <linux-kernel@...r.kernel.org>
Subject:
回复: bcache: fix oops bug in cache_set_flush
Hi, cheliequan and Coly:
I saw some dmesg printing information from https://gitee.com/openeuler/kernel/issues/IB3YQZ
```
[ 359.618056] bcache: prio_read() bad csum reading priorities
[ 359.624878] bcache: bch_cache_set_error() error on f774c122-6c02-469b-b798-ca53c10efa76: IO error reading priorities, disabling caching
```
We have encountered the bad csum error before, but it did not cause a kernel panic.
From the code, bch_btree_node_get is called after prio_read, so c->root is empty.
```
static int run_cache_set(struct cache_set *c)
{
......
err = "IO error reading priorities";
if (prio_read(ca, j->prio_bucket[ca->sb.nr_this_dev]))
goto err;
/*
* If prio_read() fails it'll call cache_set_error and we'll
* tear everything down right away, but if we perhaps checked
* sooner we could avoid journal replay.
*/
k = &j->btree_root;
err = "bad btree root";
if (__bch_btree_ptr_invalid(c, k))
goto err;
err = "error reading btree root";
c->root = bch_btree_node_get(c, NULL, k,
j->btree_level,
true, NULL);
......
}
```
This issue should be caused by 028ddca(bcache: Remove unnecessary NULL point check in node allocations).
This patch only focuses on bug fixes for __bch-btree_node_alloc without considering other code branches.
For this kernel panic, the following modifications may be sufficient to fix it. But I'm not sure if there are any other issues,
maybe we need to revent 028ddca(bcache: Remove unnecessary NULL point check in node allocations).
```
- if (!IS_ERR(c->root))
+ if (!IS_ERR_OR_NULL(c->root)) {
```
From: linux-bcache+bounces-781-mingzhe.zou=easystack.cn@...r.kernel.org <linux-bcache+bounces-781-mingzhe.zou=easystack.cn@...r.kernel.org> on behalf of liequan che <liequanche@...il.com>
Sent: Wednesday, November 13, 2024 2:25 PM
To: Coly Li <colyli@...e.de>; Kent Overstreet <kent.overstreet@...il.com>; linux-bcache <linux-bcache@...r.kernel.org>; linux-kernel <linux-kernel@...r.kernel.org>
Subject: bcache: fix oops bug in cache_set_flush
Signed-off-by: cheliequan <cheliequan@...pur.com>
If the bcache cache disk contains damaged btree data,
when the bcache cache disk partition is directly operated,
the system-udevd service is triggered to call the bcache-register
program to register the bcache device,resulting in kernel oops.
crash> bt
PID: 7773 TASK: ffff49cc44d69340 CPU: 57 COMMAND: "kworker/57:2"
#0 [ffff800046373800] machine_kexec at ffffbe5039eb54a8
#1 [ffff8000463739b0] __crash_kexec at ffffbe503a052824
#2 [ffff8000463739e0] crash_kexec at ffffbe503a0529cc
#3 [ffff800046373a60] die at ffffbe5039e9445c
#4 [ffff800046373ac0] die_kernel_fault at ffffbe5039ec698c
#5 [ffff800046373af0] __do_kernel_fault at ffffbe5039ec6a38
#6 [ffff800046373b20] do_page_fault at ffffbe503ac76ba4
#7 [ffff800046373b70] do_translation_fault at ffffbe503ac76ebc
#8 [ffff800046373b90] do_mem_abort at ffffbe5039ec68ac
#9 [ffff800046373bc0] el1_abort at ffffbe503ac669bc
#10 [ffff800046373bf0] el1_sync_handler at ffffbe503ac671d4
#11 [ffff800046373d30] el1_sync at ffffbe5039e82230
#12 [ffff800046373d50] cache_set_flush at ffffbe50121fa4c4 [bcache]
#13 [ffff800046373da0] process_one_work at ffffbe5039f5af68
#14 [ffff800046373e00] worker_thread at ffffbe5039f5b3c4
#15 [ffff800046373e50] kthread at ffffbe5039f634b8
crash> dis cache_set_flush+0x94
0xffffbe50121fa4c8 <cache_set_flush+148>: str x23, [x20, #512]
---
drivers/md/bcache/super.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index fd97730479d8..8a41dfcf9fb6 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1741,8 +1741,10 @@ static void cache_set_flush(struct closure *cl)
if (!IS_ERR_OR_NULL(c->gc_thread))
kthread_stop(c->gc_thread);
- if (!IS_ERR(c->root))
- list_add(&c->root->list, &c->btree_cache);
+ if (!IS_ERR_OR_NULL(c->root)) {
+ if (!list_empty(&c->root->list))
+ list_add(&c->root->list, &c->btree_cache);
+ }
/*
* Avoid flushing cached nodes if cache set is retiring
@@ -1750,10 +1752,12 @@ static void cache_set_flush(struct closure *cl)
*/
if (!test_bit(CACHE_SET_IO_DISABLE, &c->flags))
list_for_each_entry(b, &c->btree_cache, list) {
- mutex_lock(&b->write_lock);
- if (btree_node_dirty(b))
- __bch_btree_node_write(b, NULL);
- mutex_unlock(&b->write_lock);
+ if (!IS_ERR_OR_NULL(b)) {
+ mutex_lock(&b->write_lock);
+ if (btree_node_dirty(b))
+ __bch_btree_node_write(b, NULL);
+ mutex_unlock(&b->write_lock);
+ }
}
if (ca->alloc_thread)
--
2.33.0
Powered by blists - more mailing lists