[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6ec7a4340705052204n35f92d8bifb9e22b42cccaa53@mail.gmail.com>
Date:	Sun, 6 May 2007 13:04:10 +0800
From:	"Xu CanHao" <xucanhao@...il.com>
To:	7eggert@....de
Cc:	"Theodore Tso" <tytso@....edu>, linux-kernel@...r.kernel.org
Subject: Re: Ext3 vs NTFS performance
2007/5/6, Bodo Eggert <7eggert@....de>:
> Theodore Tso <tytso@....edu> wrote:
>
> > But as has already been discussed on this thread, in situations where
> > the fileserver is under high memory pressure, any filesystem (XFS or
> > ext4) would still end up allocating blocks out of order, resulting in
> > fragmentation.  Explicit preallocation, as opposed to delayed
> > allocation, is really the best long-term solution; and in order to do
> > that, Samba needs to detect this scenario --- which as has been noted,
> > there appears to be no good reason for the Windows CIFS client (or any
> > other application)to be doing this, other than perhaps to deliberate
> > trigger a worst case allocation pattern in ext3 --- and translate it
> > into a explicit preallocation request.
>
> There is an interface to tell the kernel about the way the file will be
> accessed. IMO this interface should be used to do the preallocation, too.
>
> The other question is: How to tell the poor-bill's preallocation from a
> very clever application that communicates with another application and
> which is supposed to zero out that exact byte from the data the other
> application sent. I was tempted to say "just let samba cache these calls",
> but it would be wrong. You'll need magic in the kernel to DTRT.
>
> There are three correct ways of handling these one-zerobyte-writes after EOF:
>
> 1) Extend the file like truncate
> 2) Extend the file like write() (current behaviour)
> 3) Preallocate these blocks (to be implemented)
> 4) Write all zeroes (current behaviour for FAT)
>
> (2) will cause bad allocations, it's obviously worse than (1). (3) would be
> better than (1) and (2), but only xfs(?) and ext4 will support this in the
> near future. (4) should double the write time, but give the best possible
> read speed. According to [1], the expected read speed is about as high as (1)
> gives, "playback performance improves to expected levels". If preallocation
> does not seem to make a big difference, I don't think we should do (4) as
> a replacement untill the filesystem does support real preallocations.
>
>
> I suggest:
>
> 1) Make samba use fadvise(MIGHT_PREALLOCATE)
> 2) Make the kernel turn these 1-byte-writes-after-EOF into truncates
>    on MIGHT_PREALLOCATE, and possibly turn off MIGHT_PREALLOCATE on
>    other read/writes
> 3) Make the kernel fadvise(PREALLOCATE, $filesize)
>    on MIGHT_PREALLOCATE + lseek(0), turning off the MIGHT_PREALLOCATE
>    Possibly it might also turn on FADV_SEQUENTIAL.
> 4) Make the filesystems optionally preallocate the desired area, or
>    ignore fadvise(PREALLOCATE, $filesize) instead.
>
>
> [1] http://softwarecommunity.intel.com/articles/eng/1259.htm
> --
> It is still called paranoia when they really are out to get you.
>
> Friß, Spammer: oA@...2dX.7eggert.dyndns.org
>  CZCkzfiaNb@...gert.dyndns.org nkp@...gert.dyndns.org
>
So it would be possible, that "Explicit Preallocation" + "Delayed
Allocation" + (some other technology) would minimize file-system
fragmentation. And further more, massive fragments of large downloads
may could be solved by "Explicit Preallocation" too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
