[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070614120413.GD86004887@sgi.com>
Date: Thu, 14 Jun 2007 22:04:13 +1000
From: David Chinner <dgc@....com>
To: David Chinner <dgc@....com>,
"Amit K. Arora" <aarora@...ux.vnet.ibm.com>,
Suparna Bhattacharya <suparna@...ibm.com>, torvalds@...l.org,
akpm@...ux-foundation.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
xfs@....sgi.com, cmm@...ibm.com
Subject: Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc
On Thu, Jun 14, 2007 at 03:14:58AM -0600, Andreas Dilger wrote:
> On Jun 14, 2007 09:52 +1000, David Chinner wrote:
> > B FA_PREALLOCATE
> > provides the same functionality as
> > B FA_ALLOCATE
> > except it does not ever change the file size. This allows allocation
> > of zero blocks beyond the end of file and is useful for optimising
> > append workloads.
> > TP
> > B FA_DEALLOCATE
> > removes the underlying disk space with the given range. The disk space
> > shall be removed regardless of it's contents so both allocated space
> > from
> > B FA_ALLOCATE
> > and
> > B FA_PREALLOCATE
> > as well as from
> > B write(3)
> > will be removed.
> > B FA_DEALLOCATE
> > shall never remove disk blocks outside the range specified.
>
> So this is essentially the same as "punch".
Depends on your definition of "punch".
> There doesn't seem to be
> a mechanism to only unallocate unused FA_{PRE,}ALLOCATE space at the
> end.
ftruncate()
> > B FA_DEALLOCATE
> > shall never change the file size. If changing the file size
> > is required when deallocating blocks from an offset to end
> > of file (or beyond end of file) is required,
> > B ftuncate64(3)
> > should be used.
>
> This also seems to be a bit of a wart, since it isn't a natural converse
> of either of the above functions. How about having two modes,
> similar to FA_ALLOCATE and FA_PREALLOCATE?
<shrug>
whatever.
> Say, FA_PUNCH (which
> would be as you describe here - deletes all data in the specified
> range changing the file size if it overlaps EOF,
Punch means different things to different people. To me (and probably
most XFS aware ppl) punch implies no change to the file size.
i.e. anyone curently using XFS_IOC_UNRESVSP will expect punching
holes to leave the file size unchanged. This is the behaviour I
described for FA_DEALLOCATE.
> and FA_DEALLOCATE,
> which only deallocates unused FA_{PRE,}ALLOCATE space?
That's an "unwritten-to-hole" extent conversion. Is that really
useful for anything? That's easily implemented with FIEMAP
and FA_DEALLOCATE.
Anyway, because we can't agree on a single pair of flags:
FA_ALLOCATE == posix_fallocate()
FA_DEALLOCATE == unwritten-to-hole ???
FA_RESV_SPACE == XFS_IOC_RESVSP64
FA_UNRESV_SPACE == XFS_IOC_UNRESVSP64
> We might also consider making @mode be a mask instead of an enumeration:
>
> FA_FL_DEALLOC 0x01 (default allocate)
> FA_FL_KEEP_SIZE 0x02 (default extend/shrink size)
> FA_FL_DEL_DATA 0x04 (default keep written data on DEALLOC)
i.e:
#define FA_ALLOCATE 0
#define FA_DEALLOCATE FA_FL_DEALLOC
#define FA_RESV_SPACE FA_FL_KEEP_SIZE
#define FA_UNRESV_SPACE FA_FL_DEALLOC | FA_FL_KEEP_SIZE | FA_FL_DEL_DATA
> I suppose it might be a bit late in the game to add a "goal"
> parameter and e.g. FA_FL_REQUIRE_GOAL, FA_FL_NEAR_GOAL, etc to make
> the API more suitable for XFS?
It would suffice for the simpler operations, I think, but we'll
rapidly run out of flags and we'll still need another interface
for doing complex stuff.....
> The goal could be a single __u64, or
> a struct with e.g. __u64 byte offset (possibly also __u32 lun like
> in FIEMAP). I guess the one potential limitation here is the
> number of function parameters on some architectures.
To be useful it needs to __u64.
> > B ENOSPC
> > There is not enough space left on the device containing the file
> > referred to by
> > IR fd.
>
> Should probably say whether space is removed on failure or not. In
Right. I'd say on error you need to FA_DEALLOCATE to ensure any space
allocated was freed back up. That way the error handling in the allocate
functions is much simpler (i.e. no need to undo there).
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists