lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <B5285968-90F7-4A0E-AB92-0179598E4C97@boeing.com>
Date:	Tue, 28 Jun 2011 23:22:14 -0500
From:	"Moffett, Kyle D" <Kyle.D.Moffett@...ing.com>
To:	Jan Kara <jack@...e.cz>
CC:	"Ted Ts'o" <tytso@....edu>, Lukas Czerner <lczerner@...hat.com>,
	Sean Ryle <seanbo@...il.com>,
	"615998@...s.debian.org" <615998@...s.debian.org>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	Sachin Sant <sachinp@...ibm.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel
 BUG at fs/jbd2/commit.c:534" from Postfix on ext4

On Jun 28, 2011, at 18:57, Jan Kara wrote:
> On Tue 28-06-11 14:30:55, Moffett, Kyle D wrote:
>> On Jun 28, 2011, at 05:36, Jan Kara wrote:
>>> Well, direct IO is atomic in data=journal the same way as in data=ordered.
>>> It can happen only half of direct IO write is done when you hit power
>>> button at the right moment - note this holds for overwrites.  Extending
>>> writes or writes to holes are all-or-nothing for ext4 (again both in
>>> data=journal and data=ordered mode).
>> 
>> My impression of journalled data was that a single-sector write would
>> be written checksummed into the journal and then later into the actual
>> filesystem, so it would either complete (IE: journal entry checksum is
>> OK and it gets replayed after a crash) or it would not (IE: journal
>> entry does not checksum and therefore the later write never happened
>> and the entry is not replayed).
> 
> Umm, right. This is true. That's another guarantee of data=journal mode I
> didn't think of.

Ok, that's what I had hoped was the case.  That doesn't help much for
overwrites of variable-length data (EG: text files), but it does help
protect stuff like MySQL MyISAM (which does not do journalling).  It's
probably unnecessary for MySQL InnoDB, which *does* have its own journal.


>>> Page sized and page aligned writes are atomic (in both data=journal and
>>> data=ordered modes). When a write spans multiple pages, there are chances
>>> the writes will be merged in a single transaction but no guarantees as you
>>> properly write.
>> 
>> I don't know that our definitions of "atomic write" are quite the same...
>> 
>> I'm assuming that filesystem "atomic write" means that even if the disk
>> itself does not guarantee that a single write will either complete or it
>> will be discarded, then the filesystem will provide that guarantee.
> 
> OK. There are different levels of "disk does not guarantee atomic writes"
> though. E.g. flash disks don't guarantee atomic writes but even more they
> actually corrupt unrelated blocks on power failure so any filesystem is
> actually screwed on power failure. For standard rotating drives I'd rely on
> the drive being able to write a full fs block (4k) although I agree noone
> really guarantees this.

Well, I've seen a study somewhere that some spinning media actually *can*
tend to corrupt a nearby sector or two during a power failure, depending
on exactly what the input voltage does.  The better ones certainly have
a voltage monitor that automatically cuts power to the heads when it goes
below a critical level.

And the better Flash-based media actually *do* provide atomic write
guarantees due to the wear-levelling and flash-remapping engine.  In
order to protect their mapping table metadata and avoid very large
write amplification they will use a system similar to a log-structured
filesystem to accumulate a bunch of small random writes into one larger
write.  Since they're always writing into empty space and then doing an
atomic metadata update, their writes are always effectively atomic,
even for data.

My informal testing of the Intel X-18M drives seems to indicate that
they work that way.

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ