lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090724004324.GB14052@mit.edu>
Date:	Thu, 23 Jul 2009 20:43:24 -0400
From:	Theodore Tso <tytso@....edu>
To:	Mingming Cao <cmm@...ibm.com>
Cc:	Eric Sandeen <sandeen@...hat.com>,
	Andreas Dilger <adilger@....com>, linux-ext4@...r.kernel.org
Subject: Re: How to fix up mballoc

On Thu, Jul 23, 2009 at 10:51:58AM -0700, Mingming Cao wrote:
> I am trying to understand what user cases prefer normalize allocation  
> request size? If they are uncommon cases, perhaps
> we should disable the normalize the allocation size disabled by default,  
> unless the apps opens the files with O_APPEND?

The case where we would want to round the allocation size up would be
if we are writing a large file (say, like a large mp3 or mpeg4 file),
which takes a while for the audio/video encoder to write out the
blocks.   In that case, doing file-based preallocation is a good thing.

Normally, if we are doing block allocations for files greater than 16
blocks (i.e, 64k), we use file-based preallocation.  Otherwise we use
block group allocations.  The problem with using block group
allocations is that way it works is that first time we try to allocate
out of a block group, we try to find a free extent which is 512 blocks
long.  If we can't find a free extent which is 512 blocks long, we'll
try another block group.  Hence, for small files, once a block group
gets fragmented to the point where there isn't a free chunk which is
512 blocks long, we'll try to find another block group --- even if
that means finding another block group far, FAR away from the block
group where the directory is contained.

Worse yet, if we unmount and remount the filesystem, we forget the
fact that we were using a particular (now-partially filled)
preallocation group, so the next time we try to allocate space for a
small file, we will find *another* free 512 block chunk to allocate
small files.  Given that there is 32,768 blocks in block group, after
64 interations of "mount, write one 4k file in a directory, unmount",
that block group will have 64 files, each separated by 511 blocks, and
that block group will no longer have any free 512 chunks for block
allocations.  (And given that the block preallocation is per-CPU, it
becomes even worse on an SMP system.)

Put this baldly, it may be that we need to do a fundamental rethink on
how we do per-cpu, per-blockgroup preallocations for small files.
Maybe instead of trying to find a 512 extent which is completely full,
we should instead be looking for a 512 extent which has at least
mb_stream_req free blocks (i.e. by default 16 free blocks).

	      	   	  	   	      	   - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ