lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120615220453.GC7363@thunk.org>
Date:	Fri, 15 Jun 2012 18:04:54 -0400
From:	Ted Ts'o <tytso@....edu>
To:	Arnd Bergmann <arnd.bergmann@...aro.org>
Cc:	Alex Lemberg <Alex.Lemberg@...disk.com>,
	HYOJIN JEONG <syr.jeong@...sung.com>,
	Saugata Das <saugata.das@...aro.org>,
	Artem Bityutskiy <dedekind1@...il.com>,
	Saugata Das <saugata.das@...ricsson.com>,
	linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	linux-mmc@...r.kernel.org, patches@...aro.org, venkat@...aro.org,
	"Luca Porzio (lporzio)" <lporzio@...ron.com>
Subject: Re: [PATCH 2/3] ext4: Context support

On Thu, Jun 14, 2012 at 09:55:31PM +0000, Arnd Bergmann wrote:
> 
> As soon as we get into the territory of the file system being
> smart about keeping separate contexts for some files rather than
> just using the low bits of the inode number or the pid, we get
> more problems:
> 
> * The block device needs to communicate the number of available
>   contexts to the file system
> * We have to arbitrate between contexts used on different partitions
>   of the same device

Can't we virtualize this?  Would this work?

The file system can simply create as many virtual contexts as it
likes; if there are no more contexts available, the block device
simply closes the least recently used context (no matter what
partition).  If the file system tries to use a virtual context where
the underlying physical context has been closed, the block device will
simply open a new physical context (possibly closing some other old
context).

> There is one more option we have to give the best possible performance,
> although that would be a huge amount of work to implement:
> 
> Any large file gets put into its own context, and we mark that
> context "write-only" "unreliable" and "large-unit". This means the
> file system has to write the file sequentially, filling one erase
> block at a time, writing only "superpage" units (e.g. 16KB) or
> multiples of that at once. We can neither overwrite nor read back
> any of the data in that context until it is closed, and there is
> no guarantee that any of the data has made it to the physical medium
> before the context is closed. We are allowed to do read and write
> accesses to any other context between superpage writes though.
> After closing the context, the data will be just like any other
> block again.

Oh, that's cool.  And I don't think that's hard to do.  We could just
keep a flag in the in-core inode indicating whether it is in "large
unit" mode.  If it is in large unit mode, we can make the fs writeback
function make sure that we adhere to the restrictions of the large
unit mode, and if at any point we need to do something that might
violate the constraints, the file system would simply close the
context.

The only reason I can think of why this might be problematic is if
there is a substantial performance cost involved with opening and
closing contexts on eMMC devices.  Is that an issue we need to be
worried about?

> Right now, there is no support for large-unit context and also not for
> read-only or write-only contexts, which means we don't have to
> enforce strict policies and can basically treat the context ID
> as a hint. Using the advanced features would require that we
> keep track of the context IDs across partitions and have to flush
> write-only contexts before reading the data again. If we want to
> do that, we can probably discard the patch series and start over.

Well, I'm interested in getting something upstream, which is useful
not just for the consumer-grade eMMC devices in handsets, but which
might also be extensible to SSD's, and all the way up to PCIe-attached
flash devices that might be used in large data centers.

I think if we do things right, it should be possible to do something
which would accomodate a large range of devices (which is why I
brought up the concept of exposing virtualized contexts to the file
system layer).

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ