lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1178584229.3933.60.camel@dyn9047017103.beaverton.ibm.com>
Date:	Mon, 07 May 2007 17:30:29 -0700
From:	Mingming Cao <cmm@...ibm.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Theodore Tso <tytso@....edu>,
	Andreas Dilger <adilger@...sterfs.com>,
	"Amit K. Arora" <aarora@...ux.vnet.ibm.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-ext4@...r.kernel.org, xfs@....sgi.com, suparna@...ibm.com
Subject: Re: [PATCH 4/5] ext4: fallocate support in ext4

On Mon, 2007-05-07 at 16:31 -0700, Andrew Morton wrote:
> On Mon, 7 May 2007 19:14:42 -0400
> Theodore Tso <tytso@....edu> wrote:
> 
> > On Mon, May 07, 2007 at 03:38:56PM -0700, Andrew Morton wrote:
> > > > Actually, this is a non-issue.  The reason that it is handled for extent-only
> > > > is that this is the only way to allocate space in the filesystem without
> > > > doing the explicit zeroing.  For other filesystems (including ext3 and
> > > > ext4 with block-mapped files) the filesystem should return an error (e.g.
> > > > -EOPNOTSUPP) and glibc will do manual zero-filling of the file in userspace.
> > > 
> > > It can be a bit suboptimal from the layout POV.  The reservations code will
> > > largely save us here, but kernel support might make it a bit better.
> > 
> > Actually, the reservations code won't matter, since glibc will fall
> > back to its current behavior, which is it will do the preallocation by
> > explicitly writing zeros to the file.
> 
> No!  Reservations code is *critical* here.  Without reservations, we get
> disastrously-bad layout if two processes were running a large fallocate()
> at the same time.  (This is an SMP-only problem, btw: on UP the timeslice
> lengths save us).
> 
> My point is that even though reservations save us, we could do even-better
> in-kernel.
> 

In this case, since the number of blocks to preallocate (eg. N=10GB) is
clear, we could improve the current reservation code, to allow callers
explicitly ask for a new window that have the minimum N free blocks for
the blocks-to-preallocated(rather than just have at least 1 free
blocks).

Before the ext4_fallocate() is called, the right reservation window size
is set with the flag to indicating "please spend time if needed to find
a window covers at least N free blocks".

So for ex4 block mapped files, later when glibc is doing allocation and
zeroing, the ext4 block-mapped allocator will knows to reserve the right
amount of free blocks before allocating and zeroing 10GB space.

I am not sure whether this worth the effort though.

> But then, a smart application would bypass the glibc() fallocate()
> implementation and would tune the reservation window size and would use
> direct-IO or sync_file_range()+fadvise(FADV_DONTNEED).
> 
> > This wlil result in the same
> > layout as if we had done the persistent preallocation, but of course
> > it will mean the posix_fallocate() could potentially take a long time
> > if you're a PVR and you're reserving a gig or two for a two hour movie
> > at high quality.  That seems suboptimal, granted, and ideally the
> > application should be warned about this before it calls
> > posix_fallocate().  On the other hand, it's what happens today, all
> > the time, so applications won't be too badly surprised.
> 
> A PVR implementor would take all this over and would do it themselves, for
> sure.
> 
> > If we think applications programmers badly need to know in advance if
> > posix_fallocate() will be fast or slow, probably the right thing is to
> > define a new fpathconf() configuration option so they can query to see
> > whether a particular file will support a fast posix_fallocate().  I'm
> > not 100% convinced such complexity is really needed, but I'm willing
> > to be convinced....  what do folks think?
> > 
> 
> An application could do sys_fallocate(one-byte) to work out whether it's
> supported in-kernel, I guess.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ