lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 2 Apr 2009 19:22:30 -0400 (EDT)
From:	Mikulas Patocka <mpatocka@...hat.com>
To:	Al Viro <viro@...IV.linux.org.uk>
cc:	Christoph Hellwig <hch@...radead.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fix bmap-vs-truncate race



On Wed, 1 Apr 2009, Al Viro wrote:

> On Tue, Mar 31, 2009 at 06:42:34PM -0400, Mikulas Patocka wrote:
> > 
> > There is a lot of text about directories, but nothing about locking of 
> > block mappings.
> > 
> > I was living under an impression that get_block() cannot be called on a 
> > block that is being truncated. That's what read/write/direct-io vs 
> > truncate seems to guarante --- truncate will first lower i_size 
> > (preventing any new pages past i_size from being created), then destroy 
> > any existing pages past i_size (that includes waiting for pagelock until 
> > all get_blocks on that page end) and finally truncate the metadata on the 
> > filesystem.
> > 
> > So there should be no situation when you truncate block and call get_block 
> > on it simultaneously. If get_block can race with truncate, document it.
> > 
> > There are filesystems that don't do any locking on get_block() (for 
> > example UFS, HPFS; FAT does it only for bmap and doesn't do it for general 
> > accesses) and other filesystems verify indirect block chains obsessively 
> > if they were truncated under get_block (why? because of bmap? or some 
> > other possibility?) --- so the rules should really be documented.
> 
> Indirect chain stuff used to be [1] about truncate that *wouldn't* affect page
> in question.  Look: we have e.g. 4Kb blocks and data at offset 80Kb.  We do
> allocation at offset 40Kb *and* truncate to 60Kb at the same time.
> 
> Both 40Kb (block 10) and 80Kb (block 20) are covered by the first indirect
> block.  It's there, so get_block() reads it and gets ready to allocate
> a block and put its number in the very beginning of indirect block.  In
> the meanwhile, truncate() sees that the boundary falls within the first
> indirect block (at entry 15).  It sees that we have no blocks prior to
> that, so the indirect block ought to be freed.
> 
> Now ext2_get_block() comes back with allocated data block and has nowhere
> to stick it anymore - indirect one just got freed.

I see. So if we change ext2_truncate to not delete indirect blocks that 
map only partially truncated space, we could drop that verify_chanin().

Upside: get rid of up to 3 spinlocks & associated cache bounce from every 
get_block call.

Downside: truncate with sparse files would occasionally produce empty 
indirect block. Is it legal to have indirect block full of zero pointers 
on ext2? Or would fsck complain about it?

> _That_ is where verify_chain() came from.  As far as anything outside of
> ext2 can know, this truncate() won't come anywhere near the page we are
> working with.  And it won't - for data, that is.

True. Except that bmap case. Bmap should be either documented or fixed 
with my proposed patch.

> Disclaimer: this code has been changed several times since the last time
> I worked with it, so this might not match the current situation anymore.
> 
> [1] see disclaimer above.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ