lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <8A8B28AA-3481-4CFF-AEAA-0CB4CCDFF9F9@cam.ac.uk>
Date:	Sun, 4 Mar 2007 23:22:06 +0000
From:	Anton Altaparmakov <aia21@....ac.uk>
To:	Ulrich Drepper <drepper@...hat.com>
Cc:	Arnd Bergmann <arnd@...db.de>,
	Christoph Hellwig <hch@...radead.org>,
	Dave Kleikamp <shaggy@...ux.vnet.ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Amit K. Arora" <aarora@...ux.vnet.ibm.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-ext4@...r.kernel.org, suparna@...ibm.com, cmm@...ibm.com,
	alex@...sterfs.com, suzuki@...ibm.com
Subject: Re: [RFC] Heads up on sys_fallocate()

Hi,

On 4 Mar 2007, at 22:38, Ulrich Drepper wrote:
> Anton Altaparmakov wrote:
>> And that is it.  No zeroing needs to happen at all because we
>> have not updated the initialized size of the inode!
>
> When you do it like this, who can the kernel/filesystem *guarantee*  
> that
> when the data is written there actually is room on the harddrive?

The blocks are allocated so of course it is guaranteed.  Subsequent  
writes to this file will not generate any allocations thus  
allocations cannot fail.  (-:

> What you described seems like using truncate/ftruncate to increase the
> file's size.  That is not at all what posix_fallocate is for.
> posix_fallocate must make sure that the requested blocks on the  
> disk are
> reserved (allocated) for the file's use and that at no point in the
> future will, say, a msync() fail because a mmap(MAP_SHARED) page has
> been written to.

No that is different.  I described performing the allocations in the  
volume bitmap, i.e. for each allocated block the corresponding "in  
use" bit is set in the bitmap (NTFS uses a linear bitmap where byte  
0, bit 0 == physical block 0 of volume, byte 0, bit 1 == physical  
block 1 of volume, ... byte 1, bit 0 == block 8 of volume, ...).

Also I described updating the extent map of the inode such that it  
describes the physical blocks as belonging to the file, thus you  
would have "logical file block X corresponds to physical block Y on  
volume" entries entered into the extent map of the inode and they  
would describe the just allocated blocks.

Finally I described updating the allocated size in the inode which  
basically says "there are that many bytes worth of blocks allocated  
to this inode".

And optionally I described updating the data size in the inode which  
basically says "this file has size Z bytes".

And I specifically did NOT update the initialized size in the inode  
thus it will remain at its old value thus all new allocated blocks  
will be considered as present but not initialized thus a read will  
always return zero whilst a write will do the right thing and pad  
with zeroes as necessary (if the write is smaller than the block  
size, etc).

Note that you are right that this is like truncate in NTFS for non- 
sparse enabled inodes/volumes.

But for sparse ones, instead of doing any allocations in the bitmap  
and entering them in the extent map, you would simply add a single  
entry to the extent map that says "X blocks allocated starting at  
logical block Y corresponding to no physical blocks, i.e. they are  
sparse".  You would then also update the allocated size and data size  
as above and now you can even (but do not have to) update the  
initialized size to be equal to the data size as the file can be  
considered fully initialized because it is sparse.  As an  
implementation detail this truncate operation would not modify the  
compressed size of the inode (i.e. the really used on-disk space,  
i.e. what you get from running "du" as that does not change when you  
add sparse blocks) whilst the fallocate described above would update  
the compressed size (if the file is sparse or compressed - there is  
no compressed size in the inode if the inode is not sparse/ 
compressed) because the file now occupies more blocks on disk even if  
they are actually not initialized.

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ