lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Dec 2013 00:01:27 -0500
From:	Theodore Ts'o <tytso@....edu>
To:	Matthew Wilcox <matthew@....cx>
Cc:	Dave Chinner <david@...morbit.com>,
	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v3 0/3] Add XIP support to ext4

On Tue, Dec 17, 2013 at 07:31:43PM -0700, Matthew Wilcox wrote:
> On Wed, Dec 18, 2013 at 09:30:50AM +1100, Dave Chinner wrote:
> > No, you haven't addressed the problem. There is nothing in this
> > patch set that converts an unwritten extent after it is written to.
> > Hence on every subsequent read will return zeros because the block
> > is still marked as unwritten.
> 
> I don't understand.  Here's the path as I understand it:
> 
> xip_file_write -> __xip_file_write -> ext4_get_xip_mem(create=0),
> returns -ENODATA.  So we call ext4_get_xip_mem again, this time with
> create=1 which causes ext4_get_block() to allocate blocks.

When Dave says that the extent is unwritten, what he means is that the
block as been allocated, but it is marked as being uninitialized.
Since the block is uninitialized we must not read from that block;
instead, if the user issues a read request to an uninitialized block,
we must return all zero's for that block (lest we reveal stale data).
And if we try to write to an uninitialized block, *after* we write to
the data block, we have to clear the uninitalized block, which in some
cases might mean splitting the extent --- if we have an extent which
maps logical blocks 0 to 5 to physical blocks 100 to 105, and we write
to block #2, will need to change that single uninitialized extent to
three extents --- one covering blocks logical blocks 0-1, one covering
logical block 2, and one covering logical blocks 3-5, where the first
and third would be marked uninitialized, and the second would be
marked initialized.  Since we potentially need to convert one extent
to three extents, this might involve an extent tree node split.

You keep talking about allocated vs unallocated, and create=0 and
create=1, but even for an allocated block, that block may be marked
initialized or uninitialized --- and if it is marked uninitialized,
xip_file_write must call a file system-specific callback to allow this
conversion to take place.

Now for persistent memory, if writes are infinitely fast and free
(i.e., there are no significant write cycle limitations), there is a
terrible, terrible hack we could do, which would involve fallocate()
never creating uninitialized extents, but instead writing zero's to
all of the allocated blocks.  And whenever we open a file, before we
return a file descriptor, we could scan all of the extent blocks
looking for uninitialized extents, and if so, convert them to
initialized extents by writing zeros to all of the blocks at open
time.

This would be horribly slow for hard drives, but typically the devices
that use XIP are either read-only, or fairly fast.  It's really ugly,
and if it turns out that persistent memory is not infinitely fast and
have infinite write cycles (and my understanding is that persistent
memory is either extraordinarily expensive, or not as perfect in terms
speed and wear resistenace as some of their boosters claim), it's not
clear that trying to evade dealing with uninitialized extents is a
smart way to go.

In other words, suppose somone calls fallocate on a 2GB region on an
XIP mounted file system.  Would you be happy forcing 2GB's worth of
writes at fallocate time(), just because we don't want to deal with
adding a file system callback in xip_file_write()?

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ