lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 06 May 2007 00:25:45 +0200
From:	Bodo Eggert <7eggert@....de>
To:	Theodore Tso <tytso@....edu>, Xu CanHao <xucanhao@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: Ext3 vs NTFS performance

Theodore Tso <tytso@....edu> wrote:

> But as has already been discussed on this thread, in situations where
> the fileserver is under high memory pressure, any filesystem (XFS or
> ext4) would still end up allocating blocks out of order, resulting in
> fragmentation.  Explicit preallocation, as opposed to delayed
> allocation, is really the best long-term solution; and in order to do
> that, Samba needs to detect this scenario --- which as has been noted,
> there appears to be no good reason for the Windows CIFS client (or any
> other application)to be doing this, other than perhaps to deliberate
> trigger a worst case allocation pattern in ext3 --- and translate it
> into a explicit preallocation request.

There is an interface to tell the kernel about the way the file will be
accessed. IMO this interface should be used to do the preallocation, too.

The other question is: How to tell the poor-bill's preallocation from a
very clever application that communicates with another application and
which is supposed to zero out that exact byte from the data the other
application sent. I was tempted to say "just let samba cache these calls",
but it would be wrong. You'll need magic in the kernel to DTRT.

There are three correct ways of handling these one-zerobyte-writes after EOF:

1) Extend the file like truncate
2) Extend the file like write() (current behaviour)
3) Preallocate these blocks (to be implemented)
4) Write all zeroes (current behaviour for FAT)

(2) will cause bad allocations, it's obviously worse than (1). (3) would be
better than (1) and (2), but only xfs(?) and ext4 will support this in the
near future. (4) should double the write time, but give the best possible
read speed. According to [1], the expected read speed is about as high as (1)
gives, "playback performance improves to expected levels". If preallocation
does not seem to make a big difference, I don't think we should do (4) as
a replacement untill the filesystem does support real preallocations.


I suggest:

1) Make samba use fadvise(MIGHT_PREALLOCATE)
2) Make the kernel turn these 1-byte-writes-after-EOF into truncates
   on MIGHT_PREALLOCATE, and possibly turn off MIGHT_PREALLOCATE on
   other read/writes
3) Make the kernel fadvise(PREALLOCATE, $filesize)
   on MIGHT_PREALLOCATE + lseek(0), turning off the MIGHT_PREALLOCATE
   Possibly it might also turn on FADV_SEQUENTIAL.
4) Make the filesystems optionally preallocate the desired area, or
   ignore fadvise(PREALLOCATE, $filesize) instead.


[1] http://softwarecommunity.intel.com/articles/eng/1259.htm
-- 
It is still called paranoia when they really are out to get you.

Friß, Spammer: oA@...2dX.7eggert.dyndns.org
 CZCkzfiaNb@...gert.dyndns.org nkp@...gert.dyndns.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ