linux-kernel - Re: writing file to disk: not as easy as it looks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0812031613320.5406@artax.karlin.mff.cuni.cz>
Date:	Wed, 3 Dec 2008 16:34:18 +0100 (CET)
From:	Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>
To:	Theodore Tso <tytso@....edu>
cc:	Pavel Machek <pavel@...e.cz>, Chris Friesen <cfriesen@...tel.com>,
	clock@...ey.karlin.mff.cuni.cz,
	kernel list <linux-kernel@...r.kernel.org>, aviro@...hat.com
Subject: Re: writing file to disk: not as easy as it looks



On Wed, 3 Dec 2008, Theodore Tso wrote:

> > Ok, "memory failed before disk" is ... bad hardware.
> 
> It's PC class hardware. Live with it.  Back when SGI made their own
> hardware, they noticed this problem, and so they wired up their SGI
> machines with powerfail interrupts, and extra big capacitors in their
> power supplies, and when Irix got a powerfail interrupt, it would
> frantically run around aborting DMA transfers to avoid this particular
> problem.  At least, that's what an old-timer SGI engineer (who is
> unfortunately no longer at SGI) told me.

I heard this too --- I just don't understand why did they route it to an 
interrupt and undertook the complicated sequence of aborting the commands 
by the kernel --- instead of simply routing it to PCI reset line --- that 
would reset the controller and stop it from feeding data to disks.

Also, if they had ECC memory, the chipset should detect unrecoverable 
garbage and respond with target-abort or full system reset and not feed 
bad data to the controller.

> PC class hardware don't have power fail interrupts.  Hence, my advice
> to you is that if you use a filesystem that does logical journalling
> --- better have a UPS.

ATX has PWR_OK pin that should be deasserted on power failure before the 
voltage drops.

I don't know if motherboards use it --- but there should be no problem 
routing the pin to the chipset reset and stop it before power goes low.

> > ...but... you seem to be saying that modern filesystems can damage 
> > data even on "sane" hardware.
> 
> The example I gave was one where a disk failure could cause a file
> that had previously been sucessfully written to disk and fsync()'ed to
> be damaged by another filesystem operation ***in the face of hard
> drive failure***.  Surely that is obvious.  The most obvious case of
> that might be if the disk controller gets confused and slams a data
> block into the wrong location on disk (there's a reason why DIF
> includes the sector number in its checksum and why some enterprise
> databases do the same thing in their tablespace blocks --- it happens
> often enough that paranoid data integrity engineers worry about it).

You can read the block number back from ATA disk after you write it and 
before you submit the command.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/