lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <823a74w1cg.fsf@mid.bfk.de>
Date:	Thu, 03 Sep 2009 14:26:55 +0000
From:	Florian Weimer <fweimer@....de>
To:	Ric Wheeler <rwheeler@...hat.com>
Cc:	Krzysztof Halasa <khc@...waw.pl>,
	Christoph Hellwig <hch@...radead.org>, Mark Lord <lkml@....ca>,
	Michael Tokarev <mjt@....msk.ru>, david@...g.hm,
	Pavel Machek <pavel@....cz>, Theodore Tso <tytso@....edu>,
	NeilBrown <neilb@...e.de>, Rob Landley <rob@...dley.net>,
	Goswin von Brederlow <goswin-v-b@....de>,
	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: wishful thinking about atomic, multi-sector or full MD stripe width, writes in storage

* Ric Wheeler:

> Note that even without MD raid, the file system issues IO's in file
> system block size (4096 bytes normally) and most commodity storage
> devices use a 512  byte sector size which means that we have to update
> 8 512b sectors.

Database software often attempts to deal with this phenomenon
(sometimes called "torn page writes").  For example, you can make sure
that the first time you write to a database page, you keep a full copy
in your transaction log.  If the machine crashes, the log is replayed,
first completely overwriting the partially-written page.  Only after
that, you can perform logical/incremental logging.

The log itself has to be protected with a different mechanism, so that
you don't try to replay bad data.  But you haven't comitted to this
data yet, so it is fine to skip bad records.

Therefore, sub-page corruption is a fundamentally different issue from
super-page corruption.

BTW, older textbooks will tell you that mirroring requires that you
read from two copies of the data and compare it (and have some sort of
tie breaker if you need availability).  And you also have to re-read
data you've just written to disk, to make sure it's actually there and
hit the expected sectors.  We can't even do this anymore, thanks to
disk caches.  And it doesn't seem to be necessary in most cases.

-- 
Florian Weimer                <fweimer@....de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ