lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 30 Nov 2015 14:55:21 +0100
From:	Jan Kara <jack@...e.cz>
To:	Daeho Jeong <daeho.jeong@...sung.com>
Cc:	tytso@....edu, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 1/3] ext4: handle unwritten or delalloc buffers before
 enabling per-file data journaling

On Mon 30-11-15 14:45:37, Jan Kara wrote:
> On Wed 18-11-15 10:34:32, Daeho Jeong wrote:
> > We already allocate delalloc blocks before changing the inode mode into
> > "per-file data journal" mode to prevent delalloc blocks from remaining
> > not allocated, but another issue concerned with "BH_Unwritten" status
> > still exists. For example, by fallocate(), several buffers' status
> > change into "BH_Unwritten", but these buffers cannot be processed by
> > ext4_alloc_da_blocks(). So, they still remain in unwritten status after
> > per-file data journaling is enabled and they cannot be changed into
> > written status any more and, if they are journaled and eventually
> > checkpointed, these unwritten buffer will cause a kernel panic by the
> > below BUG_ON() function of submit_bh_wbc() when they are submitted
> > during checkpointing.
> > 
> > static int submit_bh_wbc(int rw, struct buffer_head *bh,...
> > {
> >         ...
> >         BUG_ON(buffer_unwritten(bh));
> > 
> > Moreover, when "dioread_nolock" option is enabled, the status of a
> > buffer is changed into "BH_Unwritten" after write_begin() completes and
> > the "BH_Unwritten" status will be cleared after I/O is done. Therefore,
> > if a buffer's status is changed into unwrutten but the buffer's I/O is
> > not submitted and completed, it can cause the same problem after
> > enabling per-file data journaling. You can easily generate this bug by
> > executing the following command.
> > 
> > ./kvm-xfstests -C 10000 -m nodelalloc,dioread_nolock generic/269
> > 
> > To resolve these problems and define a boundary between the previous
> > mode and per-file data journaling mode, we need to flush and wait all
> > the I/O of buffers of a file before enabling per-file data journaling
> > of the file.
> 
> 
> > 
> > Signed-off-by: Daeho Jeong <daeho.jeong@...sung.com>
> > ---
> >  fs/ext4/inode.c |    9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 612fbcf..1f9458e 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -5168,9 +5168,14 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> >  	 * be allocated any more. even more truncate on delalloc blocks
> >  	 * could trigger BUG by flushing delalloc blocks in journal.
> >  	 * There is no delalloc block in non-journal data mode.
> > +	 * We also have to handle unwritten buffers generated by
> > +	 * fallocate() and dioread_nolock option. Once per-file data
> > +	 * journaling is enabled, unwritten buffers will remain in
> > +	 * unwritten status forever and they will be the seeds of
> > +	 * kernel panic when they are checkpointed.
> 
> Can we maybe rephrase the whole comment like:
> 
> 	/* 
> 	 * Before flushing the journal and switching inode's aops, we have
> 	 * to flush all dirty data the inode has. There can be outstanding
> 	 * delayed allocations, there can be unwritten extents created by
> 	 * fallocate or buffered writes in dioread_nolock mode covered by
> 	 * dirty data which can be converted only after flushing the dirty
> 	 * data (and journalled aops don't know how to handle these cases).
> 	 */
> 
> Otherwise the patch looks good to me. So after updating the comment you can
> add:
> 
> Reviewed-by: Jan Kara <jack@...e.cz>

BTW, the code is still racy wrt mmap write faults. Once Ted merges my
patches for hole punching races, we need to grab i_mmap_sem for writing
here as well to avoid write page fault from creating dirty page while we
are switching aops... I'm mostly writing this here so that there's lower
chance this gets lost.

								Honza
> > -	if (val && test_opt(inode->i_sb, DELALLOC)) {
> > -		err = ext4_alloc_da_blocks(inode);
> > +	if (val) {
> > +		err = filemap_write_and_wait(inode->i_mapping);
> >  		if (err < 0)
> >  			return err;
> >  	}
> > -- 
> > 1.7.9.5
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> Jan Kara <jack@...e.com>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists