lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 30 Apr 2009 09:17:53 -0400
From:	Chris Mason <chris.mason@...cle.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Theodore Ts'o" <tytso@....edu>,
	Linux Kernel Developers List <linux-kernel@...r.kernel.org>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>,
	Mike Galbraith <efault@....de>
Subject: Re: [PATCH RFC] ext3 data=guarded v7

On Thu, 2009-04-30 at 13:52 +0200, Jan Kara wrote:
> On Wed 29-04-09 16:53:44, Chris Mason wrote:
> ...
> > +static int ext3_guarded_write_end(struct file *file,
> > +				struct address_space *mapping,
> > +				loff_t pos, unsigned len, unsigned copied,
> > +				struct page *page, void *fsdata)
> > +{
> > +	handle_t *handle = ext3_journal_current_handle();
> > +	struct inode *inode = file->f_mapping->host;
> > +	unsigned from, to;
> > +	int ret = 0, ret2;
> > +
> > +	copied = block_write_end(file, mapping, pos, len, copied,
> > +				 page, fsdata);
> > +
> > +	from = pos & (PAGE_CACHE_SIZE - 1);
> > +	to = from + copied;
> > +	ret = walk_page_buffers(handle, page_buffers(page),
> > +		from, to, NULL, journal_dirty_data_guarded_fn);
> > +
> > +	/*
> > +	 * we only update the in-memory i_size.  The disk i_size is done
> > +	 * by the end io handlers
> > +	 */
> > +	if (ret == 0 && pos + copied > inode->i_size) {
> > +		int must_log;
> > +
> > +		/* updated i_size, but we may have raced with a
> > +		 * data=guarded end_io handler.
> > +		 *
> > +		 * All the guarded IO could have ended while i_size was still
> > +		 * small, and if we're just adding bytes into an existing block
> > +		 * in the file, we may not be adding a new guarded IO with this
> > +		 * write.  So, do a check on the disk i_size and make sure it
> > +		 * is updated to the highest safe value.
> > +		 *
> > +		 * This may also be required if the
> > +		 * journal_dirty_data_guarded_fn chose to do an fully
> > +		 * ordered write of this buffer instead of a guarded
> > +		 * write.
> > +		 *
> > +		 * ext3_ordered_update_i_size tests inode->i_size, so we
> > +		 * make sure to update it with the ordered lock held.
> > +		 */
> > +		ext3_ordered_lock(inode);
> > +		i_size_write(inode, pos + copied);
> > +		must_log = ext3_ordered_update_i_size(inode);
> > +		ext3_ordered_unlock(inode);
> > +
> > +		orphan_del_trans(inode, must_log > 0);
> > +	}
>   Didn't we agree that only "i_size_write" should remain from the above
> "if" after you changed journal_dirty_data_guarded_fn() function?

It sounded like a really good idea at the time ;) But it doesn't cover
all the cases.  If journal_dirty_data_guarded_fn decided to do a full
ordering of a buffer because the start of the buffer was inside of
i_size, we might not have updated disk_i_size.

Basically something like this:

dd if=/dev/zero of=foo bs=2k count=1 # makes an ordered buffer
sync # disk_i_size is now 2k
dd if=/dev/zero of=foo bs=3k count=1 conv=notrunc

This will become an ordered write.  There isn't a really good way to do
it as a guarded IO because that buffer head may already have a guarded
IO attached.  I could add some complexity and cover all the cases where
an ordered IO is in flight etc etc, but it doesn't seem worth it for the
small append case.

Since it is an ordered write, there is no guarded IO to update the disk
i_size later.  The update needs to happen during write_end().  I did try
to update the comments to reflect that, but it might not be as clear as
it should be.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ