linux-kernel - Overagressive failing of disk reads, both LIBATA and IDE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <F79C774361EF45A6A68BC08F8621E406@DIAMOND8600>
Date:	Fri, 20 Mar 2009 11:12:11 +0900
From:	"Norman Diamond" <n0diamond@...oo.co.jp>
To:	<linux-kernel@...r.kernel.org>, <linux-ide@...r.kernel.org>
Subject: Overagressive failing of disk reads, both LIBATA and IDE

For months I was wondering how a disk could do this:
dd if=/dev/hda of=/dev/null bs=512 skip=551540 count=4  # succeeds
dd if=/dev/hda of=/dev/null bs=512 skip=551544 count=4  # succeeds
dd if=/dev/hda of=/dev/null bs=512 skip=551540 count=8  # fails

It turns out the disk isn't doing that.  Linux is.  The old IDE drivers did
it, but with LIBATA the same thing happens to /dev/sda.  In later examples
also, the same happens to /dev/sda as /dev/hda.

Here's what the disk is really responsible for:
dd if=/dev/hda of=/dev/null bs=512 skip=551562 count=1  # really fails

Here's Linux to blame again:
dd if=/dev/hda of=/dev/null bs=512 skip=551561 count=1  # fails

When the drive reports an uncorrectable media error, Linux correctly records
it in the log.  But when the app didn't ask for that block, when blocks that
the app asked for were all read, Linux incorrectly reports failure to the
app.

I don't know how Linux decides how many blocks to read ahead, but no matter
how many it chooses, read ahead is read ahead.  Go ahead and record it in
the log.  I'd also like to suggest that if a user is logged in on the screen
(whether X11 or text) see if we can warn them that their disk is dying.  But
don't return a failure to the app.  If the blocks that the app asked for
were read, we should give them to the app, successfully.

Sheesh.

P.S.
One would expect this to persuade the hard drive to relocate the block:
dd if=/dev/zero of=/dev/hda bs=512 seek=551562 count=1
But it doesn't because Linux wants to read 4 blocks, modify 1, and write 4
blocks.  The read fails.

One would expect this to persuade the hard drive to relocate the block:
dd if=/dev/zero of=/dev/hda bs=512 seek=551560 count=4
But it doesn't because the hard drive reports success.  If an app tries to
read the bad sector again it still fails.  So the drive has egregiously bad
firmware.  That doesn't excuse Linux.

--------------------------------------
Power up the Internet with Yahoo! Toolbar.
http://pr.mail.yahoo.co.jp/toolbar/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/