linux-kernel - RE: Mechanism to safely force repair of single md stripe w/o hurting data integrity of file system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200805172026.m4HKQwh06910@www.watkins-home.com>
Date:	Sat, 17 May 2008 16:26:53 -0400
From:	"Guy Watkins" <linux-raid@...kins-home.com>
To:	"'David Lethe'" <david@...tools.com>,
	"'LinuxRaid'" <linux-raid@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: RE: Mechanism to safely force repair of single md stripe w/o hurting data integrity of file system

} -----Original Message-----
} From: linux-raid-owner@...r.kernel.org [mailto:linux-raid-
} owner@...r.kernel.org] On Behalf Of David Lethe
} Sent: Saturday, May 17, 2008 3:10 PM
} To: LinuxRaid; linux-kernel@...r.kernel.org
} Subject: Mechanism to safely force repair of single md stripe w/o hurting
} data integrity of file system
} 
} I'm trying to figure out a mechanism to safely repair a stripe of data
} when I know a particular disk has a unrecoverable read error at a
} certain physical block (for 2.6 kernels)
} 
} My original plan was to figure out the range of blocks in md device that
} utilizes the known bad block and force a raw read on physical device
} that covers the entire chunk and let the md driver do all of the work.
} 
} Well, this didn't pan out. Problems include issues where if bad block
} maps to the parity block in a stripe then md won't necessarily
} read/verify parity, and in cases where you are running RAID1, then load
} balancing might result in the kernel reading the bad block from the good
} disk.
} 
} So the degree of difficulty is much higher than I expected.  I prefer
} not to patch kernels due to maintenance issues as well as desire for the
} technique to work across numerous kernels and  patch revisions, and
} frankly, the odds are I would screw it up.  An application-level program
} that can be invoked as necessary would be ideal.
} 
} As such, anybody up to the challenge of writing the code?  I want it
} enough to paypal somebody $500 who can write it, and will gladly open
} source the solution.
} 
} (And to clarify why, I know physical block x on disk y is bad before the
} O/S reads the block, and just want to rebuild the stripe, not the entire
} md device when this happens. I must not compromise any file system data,
} cached or non-cached that is built on the md device.  I have system with
} >100TB and if I did a rebuild every time I discovered a bad block
} somewhere, then a full parity repair would never complete before another
} physical bad block is discovered.)
} 
} Contact me offline for the financial details, but I would certainly
} appreciate some thread discussion on an appropriate architecture.  At
} least it is my opinion that such capability should eventually be native
} Linux, but as long as there is a program that can be run on demand that
} doesn't require rebuilding or patching kernels then that is all I need.
} 
} David @ santools.com

I thought this would cause md to read all blocks in an array:
echo repair > /sys/block/md0/md/sync_action

And rewrite any blocks that can't be read.

In the old days, md would kick out a disk on a read error.  When you added
it back, md would rewrite everything on that disk, which corrected read
errors.

Guy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/