lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAsfc_oTmE2E8pMctiLSwMngVUbtJa4G=KAozzAfztMMc_RMOQ@mail.gmail.com>
Date: Tue, 19 Nov 2024 12:19:38 +0800
From: liequan che <liequanche@...il.com>
To: Coly Li <colyli@...e.de>
Cc: "mingzhe.zou@...ystack.cn" <mingzhe.zou@...ystack.cn>, Kent Overstreet <kent.overstreet@...il.com>, 
	linux-bcache <linux-bcache@...r.kernel.org>, linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] bcache:fix oops in cache_set_flush

Hi coly:
>>  The same operation was performed on three servers, replacing the 12T
>> disk with a 16T disk. Only one server triggered the bug. The on-site

>What do you mean “replacing 12T disk with a 16T disk” ?

Use another 16T SATA disk to replace the 12T SATA disk.
Plan to use the 16T hard disk and the original nvme disk to recreate bcache.

>> No bcache data clearing operation was performed

>What is the “bcache data clearing operation” here?
Nothing was done. But I plan to erase the superblock after
partitioning the nvme disk.
I plan to discard the original nvme disk data by erasing the
superblock and wipe-bcache options.
>> 3. Replace the 12T SATA disk with a 16T SATA disk
>> After shutting down, unplug the 12T hard disk and replace it with a
>> 16T hard disk.

>It seems you did something bcache doesn’t support. Replace the backing device...
You are right. I may have done something that bcache does not support.
But I hope that the wrong operation will not cause the system to panic.
The consequence I can accept is that the bcache device creation fails.
The bcache module can give me a chance to erase the superblock again,
instead of entering the CD rescue mode to erase the superblock.


>> 7. Repartition again, triggering kernel panic again.
>> parted -s --align optimal /dev/nvme2n1 mkpart primary 2048s 1536GiB
>> The same operation was performed on the other two servers, and no
>> panic was triggered.

>I guess this is another undefine operation. I assume the cache device is still references somewhere. A reboot should follow the wipefs.
Your guess is correct. In addition, after erasing the superblock
information in CD rescue mode,
I rebooted into the system where the original panic kernel was located.

>> The server with the problem was able to enter the system normally
>> after the root of the cache_set structure was determined to be empty.
>> I updated the description of the problem in the link below.

>No, if you clean up the partition, no cache device will exist. Cache registration won’t treat it as a bcache device.

>OK, from the above description, I see you replace the backing device (and I don’t know where the previous data was), then you extend the cache device size. They are all unsupported operations.
The behavior here is a bit strange. After partitioning, I may have
recreated the bcache device here,
which triggered the bcache rigister operation. Then the kernel panicked again.
>make-bcache -C /dev/nvme2n1p1 -B /dev/sda --writeback --force --wipe-bcache

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ