linux-kernel - Re: Need help increasing raid scan efficiency.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <26688.15707.98922.15948@quad.stoffel.home>
Date: Wed, 4 Jun 2025 08:34:35 -0400
From: "John Stoffel" <john@...ffel.org>
To: David Niklas <simd@...mail.net>
Cc: Linux RAID <linux-raid@...r.kernel.org>,
    linux-kernel@...r.kernel.org
Subject: Re: Need help increasing raid scan efficiency.

>>>>> "David" == David Niklas <simd@...mail.net> writes:

> My PC suffered a rather nasty case of HW failure recently where the
> MB would break the CPU and RAM. I ended up with different data on
> different members of my RAID6 array.

Ouch, this is not good.  But you have RAID6, so it should be ok...

> I wanted to scan through the drives and take some checksums of
> various files in an attempt to ascertain which drives took the most
> data corruption damage, to try and find the date that the damage
> started occurring (as it was unclear when exactly this began), and
> to try and rescue some of the data off of the good pairs.

What are you comparing the checksums too?  Just because you assemble
drives 1 and 2 and read the filesystem, then assemble drives 3 and 4
into another array, how do you know which checksum is correct if they
differ?  

> So I setup the array into read-only mode and started the array with
> only two of the drives. Drives 0 and 1. Then I proceeded to try and
> start a second pair, drives 2 and 3, so that I could scan them
> simultaneously.  With the intent of then switching it over to 0 and
> 2 and 1 and 3, then 0 and 3 and 1 and 2.

I'm not sure this is really going to work how you think.... 

> This failed with the error message:
> # mdadm --assemble -o --run /dev/md128 /dev/sdc /dev/sdd
> mdadm: Found some drive for array that is already active: /dev/md127

This is not un-expected.  You already have md127 setup using the same
UUID, and mdadm is doing the right thing to refuse to assemble a
different array name with the same underlying UUID.  

But if you have four drives, you've got four sets of checksums to
calculate for each file, which is going to take time.  And I think
just doing it one pair of disks at a time is the safest way.  Your
data is important to you, obviously, but how much is it worth?  

Can you afford to get some replacement disks, or even just a single
large disk and them dump all your files onto a new single disk to try
and save what you have, even if it's corrupted?  

> Any ideas as to how I can get mdadm to run the array as I requested
> above? I did try --force, but mdadm refused to listen.

And for good reason.  You might be able to do an overlayfs on each
pair, then go in and change the UUID of the second pair to something
different, and then start the array with a new name and disk member
UUIDs.  

But it's alot of hacking for probably not much payout.

Have you found a file with corruption?  If so, have you done a quick
test where you do the four pairs of the array assembled and checked
just that one single file to see if the checksum differs?  

And again, if it does differ, how do you decide what is the correct
data?  

I would strongly suspect that the data is corrupted no matter what.  

In any case, good luck!  Maybe the raid6check tool will help, but I'd
rather try to at least use your most recent backup as a check.  

John