linux-kernel - Re: ext4: media error but where?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140707185543.GA26056@amd.pavel.ucw.cz>
Date:	Mon, 7 Jul 2014 20:55:43 +0200
From:	Pavel Machek <pavel@....cz>
To:	Theodore Ts'o <tytso@....edu>,
	kernel list <linux-kernel@...r.kernel.org>,
	adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org
Subject: Re: ext4: media error but where?

On Sun 2014-07-06 21:00:02, Theodore Ts'o wrote:
> On Sun, Jul 06, 2014 at 11:37:11PM +0200, Pavel Machek wrote:
> > 
> > Well, when I got report about hw problems, badblocks -c was my first
> > instinct. On the usb hdd, the most errors were due to 3.16-rc1 kernel
> > bug, not real problems.
> 
> The problem is with modern disk drives, this is a *wrong* instinct.
> That's my point.  In general, trying to mess with the bad blocks list
> in the ext2/3/4 file system is just not the right thing to do with
> modern disk drives.  That's because with modern disk drives, the hard
> drives will do bad block remapping.

Actually... I believe it was the right instinct. 

If I wanted to recover the data... remount-r would be the way to
go. Then back it up using dd_rescue. ... But that way I'd turn bad
sectors into silent data corruption.

If I wanted to recover data from that partition, fsck -c (or
badblocks, but that's trickier) and then dd_rescue would be the way to go.

> Basically, with modern disks, if the HDD has a hard ECC error, it will
> return an error --- but if you write to the sector, it will either
> rewrite onto that location on the platter, or if that part of the
> platter is truly gone, it will remap to the bad block spare pool.  So
> telling the disk to never use that block again isn't going to be the
> right answer.

Actually -- tool to do relocations would be nice. It is not exactly
easy to do it right by hand.

I know the theory. I had 5 read-error incidents this year.

#1: Seagate refuses to reallocate sectors. Not sure why, I tried
 pretty much everything.

#2: 3.16-rc1 produces incorrect errors every 4GB, leading to "bad
sectors" that disappear with other kernels

#3: Some more bad sectors appear on the Seagate

#4: Kernel on thinkpad reports errors in daily check. Which is strange
 because there's nothing in SMART.

#5: Some old IDE hdd has bad sectors in unused or unimportant areas. 

In #5 the theory might match the reality (I did not check, I trashed
the disks).

> The badblocks approach to dealing with hardware problems made sense
> back when we had IDE disks.  But that's been over a decade ago.  These
> days, it's horribly obsolete.

Forcing reallocation is hard & tricky. You may want to simply mark it
bad and lose a tiny bit of disk space... And even if you want to force
reallocation, you want to do fsck -c, first, and restore affected
files from backup.

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/