linux-kernel - Re: [PATCH] bcache:fix oops in cache_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D5D1CBD5-0031-4285-BE12-910D6898B465@suse.de>
Date: Fri, 15 Nov 2024 14:40:05 +0800
From: Coly Li <colyli@...e.de>
To: liequan che <liequanche@...il.com>
Cc: "mingzhe.zou@...ystack.cn" <mingzhe.zou@...ystack.cn>,
 Kent Overstreet <kent.overstreet@...il.com>,
 linux-bcache <linux-bcache@...r.kernel.org>,
 linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] bcache:fix oops in cache_set_flush



> 2024年11月14日 21:10，liequan che <liequanche@...il.com> 写道：
> 
> Hi colyli and mingzhe.zou:
>  I am trying to reproduce this problem, maybe it is a random problem.
> It is triggered only when IO error reading priorities occurs.
>  The same operation was performed on three servers, replacing the 12T
> disk with a 16T disk. Only one server triggered the bug. The on-site

What do you mean “replacing 12T disk with a 16T disk” ?



> operation steps are as follows:
> 1. Create a bache device.
> make-bcache -C /dev/nvme2n1p1 -B /dev/sda --writeback --force --wipe-bcache
> /dev/sda is a 12T SATA disk.
> /dev/nvme2n1p1 is the first partition of the nvme disk. The partition
> size is 1024G.
> The partition command is parted -s --align optimal /dev/nvme2n1 mkpart
> primary 2048s 1024GiB
> 2. Execute fio test on bcache0
> 
> cat /home/script/run-fio-randrw.sh
> bcache_name=$1
> if [ -z "${bcache_name}" ];then
> echo bcache_name is empty
> exit -1
> fi
> 
> fio --filename=/dev/${bcache_name} --ioengine=libaio --rw=randrw
> --bs=4k --size=100% --iodepth=128 --numjobs=4 --direct=1 --name=randrw
> --group_reporting --runtime=30 --ramp_time=5 --lockmem=1G | tee -a
> ./randrw-iops_k1.log
> Execute bash run-fio-randrw.sh multiple times bcache0
> 2. Shutdown
> poweroff
> No bcache data clearing operation was performed

What is the “bcache data clearing operation” here? 



> 3. Replace the 12T SATA disk with a 16T SATA disk
> After shutting down, unplug the 12T hard disk and replace it with a
> 16T hard disk.

It seems you did something bcache doesn’t support. Replace the backing device...

> 4. Adjust the size of the nvme2n1 partition to 1536G
> parted -s --align optimal /dev/nvme2n1 mkpart primary 2048s 1536GiB
> Kernel panic occurs after partitioning is completed

Yes it is expected, bcache doesn’t support resize on cache device. The operation will result a corrupted meta data layout, it is expected.


> 5. Restart the system, but cannot enter the system normally. It is
> always in the restart state.
> 6. Enter the rescue mode through the CD, clear the nvme2n1p1 super
> block information. After restarting again, you can enter the system
> normally.
> wipefs -af /dev/nvme2n1p1

OK, the cache device is cleared.


> 7. Repartition again, triggering kernel panic again.
> parted -s --align optimal /dev/nvme2n1 mkpart primary 2048s 1536GiB
> The same operation was performed on the other two servers, and no
> panic was triggered.

I guess this is another undefine operation. I assume the cache device is still references somewhere. A reboot should follow the wipefs.

> The server with the problem was able to enter the system normally
> after the root of the cache_set structure was determined to be empty.
> I updated the description of the problem in the link below.

No, if you clean up the partition, no cache device will exist. Cache registration won’t treat it as a bcache device.

OK, from the above description, I see you replace the backing device (and I don’t know where the previous data was), then you extend the cache device size. They are all unsupported operations.

It is very possible that the unsupported operations results undefined aftermath.

> bugzilla: https://gitee.com/openeuler/kernel/issues/IB3YQZ
> Your suggestion was correct. I removed the unnecessary btree_cache
> iserr_or_null check.

Here in the linux-bcache mailing list, we don’t handle distribution specific bug. Unless it is in upstream too.

But from the above description IHMO they are invalid operations, so I don’t see there is a valid bug.


>  ------------
>  If the bcache cache disk contains damaged data,
> when the bcache cache disk partition is directly operated,
> the system-udevd service is triggered to call the bcache-register
> program to register the bcache device,resulting in kernel oops.
> 
> Signed-off-by: cheliequan  <cheliequan@...pur.com>
> 
> ---
> drivers/md/bcache/super.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index fd97730479d8..c72f5576e4da 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1741,8 +1741,10 @@ static void cache_set_flush(struct closure *cl)
>        if (!IS_ERR_OR_NULL(c->gc_thread))
>                kthread_stop(c->gc_thread);
> 
> -       if (!IS_ERR(c->root))
> -               list_add(&c->root->list, &c->btree_cache);
> +       if (!IS_ERR_OR_NULL(c->root)) {
> +               if (!list_empty(&c->root->list))
> +                       list_add(&c->root->list, &c->btree_cache);
> +       }
> 

The patch just avoid an explicit kernel panic of the undefined device status. More damages are on the way even you try to veil this panic. 


Thanks.

Coly Li


>        /*
>         * Avoid flushing cached nodes if cache set is retiring
> -- 
> 2.33.0