lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20081123122041.GC17607@khazad-dum.debian.net>
Date:	Sun, 23 Nov 2008 10:20:41 -0200
From:	Henrique de Moraes Holschuh <hmh@....eng.br>
To:	Brad Campbell <brad@...p.net.au>
Cc:	Robert Hancock <hancockr@...w.ca>, linux-raid@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: Why does the md/raid subsystem does not remap bad sectors in a
	raid   array?

On Sun, 23 Nov 2008, Brad Campbell wrote:
> md has done this for a while now though. If it encounters a read error in 
> the array it will make an attempt to write the reconstructed data back to 
> that disk attempting to force a reallocation. I've seen it work quite 
> well here on disks that have the occasional grown defect.

Indeed, but it does so in the "check array" mode (which distros like
Debian are now enabling once-a-month or so, I always up that to once a
week :p)

Does md repair bitrotten sectors ALSO outside of check mode?  That's
what is being asked in this thread...

> If the disk is haemorrhaging sectors then you will find out about it 
> sooner or later through other means.

Like a weekly SMART long test.   That's what our maintenance windows are
for :)  Everything is kept on-line, but allowed to run in degraded
performance mode, so we kick in SMART offline and long tests, RAID array
scrubbing, etc (not at the same time, though!).

That reminds me to file a bug against smartmontools to DISABLE auto
offline mode on disks, and enable them one disk at a time at a random
interval with at least one hour between them.  Otherwise, the disks all
enter auto-offline-testing SMART mode at the same time.

Hmm, it would be good to teach md to measure disk throughput using a
sliding window (of say, 5 minutes) and reduce read priority of disks
that are slow...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ