lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Sat, 02 Feb 2013 21:50:37 +1100
From:	Bron Gondwana <brong@...tmail.fm>
To:	"Theodore Ts'o" <tytso@....edu>
Cc:	Robert Mueller <robm@...tmail.fm>,
	Eric Sandeen <sandeen@...hat.com>,
	Linux Ext4 mailing list <linux-ext4@...r.kernel.org>
Subject: Re: fallocate creating fragmented files

On Sat, Feb 2, 2013, at 12:55 AM, Theodore Ts'o wrote:
> On Fri, Feb 01, 2013 at 10:33:21PM +1100, Bron Gondwana wrote:
> > 
> > In particular, the way that Cyrus works seems entirely suboptimal for ext4.
> > The index and database files receive very small appends (108 byte per message
> > for the index, and probably just a few hundred per write for most of the the
> > twoskip databases), and they happen pretty much randomly to one of tens of
> > thousands of these little files, depending which mailbox received the message.
> 
> Are all of these files in a single directory?  If so, that's part of
> the problem, since ext[34] uses the directory structure to try to
> spread apart unrelated files, so that hueristic can't be easily used
> if all of the files are in a single directory.

No, but the vast majority of them are 2-3 files per directory which will be
appended to at the same time, so they probably interleave :(

> > Here's the same experiment on a "fresh" filesystem.  I created this by taking
> > a server down, copying the entire contents of the SSD to a spare piece of rust,
> > reformatting, and copying it all back (cp -a).  So the data on there is the
> > same, just the allocations have changed.
> > 
> > [brong@...p15 conf]$ fallocate -l 20m testfile
> > [brong@...p15 conf]$ filefrag -v testfile
> > Filesystem type is: ef53
> > File size of testfile is 20971520 (20480 blocks, blocksize 1024)
> >  ext logical physical expected length flags
> >    0       0 22913025            8182 unwritten
> >    1    8182 22921217 22921207   8182 unwritten
> >    2   16364 22929409 22929399   4116 unwritten,eof
> > testfile: 3 extents found
> > 
> > As you can see, that's slightly more optimal.  I'm assuming 8182 is the
> > maximum number of contiguous blocks before you hit an assigned metadata
> > location and have to skip over it.
> 
> Is there a reason why you are using a 1k block size?  The size of a
> block group is 8192 blocks for 1k blocks (or 8 megabytes), while with
> a 4k block size, the size of a block group is 32768 blocks (or 128
> megabytes).  In general the ext4 file system is going to be far more
> efficient with a 4k block size.

Mostly because a lot of our files are quite small.

Here's a set of file sizes and counts for that filesystem.

  72055 zero
 501435 <=512
  32004 <=1k
  46447 <=4k
  38411 <=16k
  49435 >16k

As you can see, the vast majority are significantly less than 1k in size,
so a 4k block size would add significant space overhead.  Basically, we
wouldn't be able to fit everything on there.

There are plans afoot to merge most of those smaller files into a single
larger per-user file, which should help eventually.  Meanwhile, this is
what we have.  We were actually considering 1k block size for our email
spools as well, which are currently 4k block size, because most emails
are smaller than 4k as well, so we would reduce the space wastage there.

Bron.
-- 
  Bron Gondwana
  brong@...tmail.fm

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ