[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D5D1CBD5-0031-4285-BE12-910D6898B465@suse.de>
Date: Fri, 15 Nov 2024 14:40:05 +0800
From: Coly Li <colyli@...e.de>
To: liequan che <liequanche@...il.com>
Cc: "mingzhe.zou@...ystack.cn" <mingzhe.zou@...ystack.cn>,
Kent Overstreet <kent.overstreet@...il.com>,
linux-bcache <linux-bcache@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] bcache:fix oops in cache_set_flush
> 2024年11月14日 21:10,liequan che <liequanche@...il.com> 写道:
>
> Hi colyli and mingzhe.zou:
> I am trying to reproduce this problem, maybe it is a random problem.
> It is triggered only when IO error reading priorities occurs.
> The same operation was performed on three servers, replacing the 12T
> disk with a 16T disk. Only one server triggered the bug. The on-site
What do you mean “replacing 12T disk with a 16T disk” ?
> operation steps are as follows:
> 1. Create a bache device.
> make-bcache -C /dev/nvme2n1p1 -B /dev/sda --writeback --force --wipe-bcache
> /dev/sda is a 12T SATA disk.
> /dev/nvme2n1p1 is the first partition of the nvme disk. The partition
> size is 1024G.
> The partition command is parted -s --align optimal /dev/nvme2n1 mkpart
> primary 2048s 1024GiB
> 2. Execute fio test on bcache0
>
> cat /home/script/run-fio-randrw.sh
> bcache_name=$1
> if [ -z "${bcache_name}" ];then
> echo bcache_name is empty
> exit -1
> fi
>
> fio --filename=/dev/${bcache_name} --ioengine=libaio --rw=randrw
> --bs=4k --size=100% --iodepth=128 --numjobs=4 --direct=1 --name=randrw
> --group_reporting --runtime=30 --ramp_time=5 --lockmem=1G | tee -a
> ./randrw-iops_k1.log
> Execute bash run-fio-randrw.sh multiple times bcache0
> 2. Shutdown
> poweroff
> No bcache data clearing operation was performed
What is the “bcache data clearing operation” here?
> 3. Replace the 12T SATA disk with a 16T SATA disk
> After shutting down, unplug the 12T hard disk and replace it with a
> 16T hard disk.
It seems you did something bcache doesn’t support. Replace the backing device...
> 4. Adjust the size of the nvme2n1 partition to 1536G
> parted -s --align optimal /dev/nvme2n1 mkpart primary 2048s 1536GiB
> Kernel panic occurs after partitioning is completed
Yes it is expected, bcache doesn’t support resize on cache device. The operation will result a corrupted meta data layout, it is expected.
> 5. Restart the system, but cannot enter the system normally. It is
> always in the restart state.
> 6. Enter the rescue mode through the CD, clear the nvme2n1p1 super
> block information. After restarting again, you can enter the system
> normally.
> wipefs -af /dev/nvme2n1p1
OK, the cache device is cleared.
> 7. Repartition again, triggering kernel panic again.
> parted -s --align optimal /dev/nvme2n1 mkpart primary 2048s 1536GiB
> The same operation was performed on the other two servers, and no
> panic was triggered.
I guess this is another undefine operation. I assume the cache device is still references somewhere. A reboot should follow the wipefs.
> The server with the problem was able to enter the system normally
> after the root of the cache_set structure was determined to be empty.
> I updated the description of the problem in the link below.
No, if you clean up the partition, no cache device will exist. Cache registration won’t treat it as a bcache device.
OK, from the above description, I see you replace the backing device (and I don’t know where the previous data was), then you extend the cache device size. They are all unsupported operations.
It is very possible that the unsupported operations results undefined aftermath.
> bugzilla: https://gitee.com/openeuler/kernel/issues/IB3YQZ
> Your suggestion was correct. I removed the unnecessary btree_cache
> iserr_or_null check.
Here in the linux-bcache mailing list, we don’t handle distribution specific bug. Unless it is in upstream too.
But from the above description IHMO they are invalid operations, so I don’t see there is a valid bug.
> ------------
> If the bcache cache disk contains damaged data,
> when the bcache cache disk partition is directly operated,
> the system-udevd service is triggered to call the bcache-register
> program to register the bcache device,resulting in kernel oops.
>
> Signed-off-by: cheliequan <cheliequan@...pur.com>
>
> ---
> drivers/md/bcache/super.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index fd97730479d8..c72f5576e4da 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1741,8 +1741,10 @@ static void cache_set_flush(struct closure *cl)
> if (!IS_ERR_OR_NULL(c->gc_thread))
> kthread_stop(c->gc_thread);
>
> - if (!IS_ERR(c->root))
> - list_add(&c->root->list, &c->btree_cache);
> + if (!IS_ERR_OR_NULL(c->root)) {
> + if (!list_empty(&c->root->list))
> + list_add(&c->root->list, &c->btree_cache);
> + }
>
The patch just avoid an explicit kernel panic of the undefined device status. More damages are on the way even you try to veil this panic.
Thanks.
Coly Li
> /*
> * Avoid flushing cached nodes if cache set is retiring
> --
> 2.33.0
Powered by blists - more mailing lists