linux-ext4 - Re: [PATCH 04/11] ext4: completed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r4phb0q5.fsf@openvz.org>
Date:	Tue, 02 Oct 2012 14:57:22 +0400
From:	Dmitry Monakhov <dmonakhov@...nvz.org>
To:	Jan Kara <jack@...e.cz>
Cc:	Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org, tytso@....edu,
	lczerner@...hat.com
Subject: Re: [PATCH 04/11] ext4: completed_io locking cleanup V4

On Tue, 2 Oct 2012 12:31:41 +0200, Jan Kara <jack@...e.cz> wrote:
> On Tue 02-10-12 11:16:38, Dmitry Monakhov wrote:
> ...
> > > > +static int ext4_do_flush_completed_IO(struct inode *inode,
> > > > +	while (!list_empty(&unwritten)) {
> > > > +		io = list_entry(unwritten.next, ext4_io_end_t, list);
> > > > +		BUG_ON(!(io->flag & EXT4_IO_END_UNWRITTEN));
> > > > +		list_del_init(&io->list);
> > > > +
> > > > +		err = ext4_end_io(io);
> > > > +		if (unlikely(!ret && err))
> > > > +			ret = err;
> > > > +
> > > > +		list_add_tail(&io->list, &complete);
> > > > +	}
> > > > +	/* It is important to update all flags for all end_io in one shot w/o
> > > > +	 * dropping the lock.*/
> > >   Why? It would seem that once io structures are moved from
> > > i_completed_io_list, they are unreachable by anyone else?
>   You seem to have missed this comment?
Yep. i've missed that comment, and yes it is appeared to not important
to update whole list atomically, it may be dropped on each end_io.
> 
> 
> > > > +	spin_lock_irqsave(&ei->i_completed_io_lock, flags);
> > > > +	while (!list_empty(&complete)) {
> > > > +		io = list_entry(complete.next, ext4_io_end_t, list);
> > > > +		io->flag &= ~EXT4_IO_END_UNWRITTEN;
> > > > +		/* end_io context can not be destroyed now because it still
> > > > +		 * used by queued worker. Worker thread will destroy it later */
> > > > +		if (io->flag & EXT4_IO_END_QUEUED)
> > > > +			list_del_init(&io->list);
> > > > +		else
> > > > +			list_move(&io->list, &to_free);
> > > > +	}
> > > > +	/* If we are called from worker context, it is time to clear queued
> > > > +	 * flag, and destroy it's end_io if it was converted already */
> > > > +	if (work_io) {
> > > > +		work_io->flag &= ~EXT4_IO_END_QUEUED;
> > > > +		if (!(work_io->flag & EXT4_IO_END_UNWRITTEN))
> > > > +			list_add_tail(&work_io->list, &to_free);
> > > >  	}
> > > > -	list_del_init(&io->list);
> > > >  	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> > > > -	(void) ext4_end_io_nolock(io);
> > > > -	mutex_unlock(&inode->i_mutex);
> > > > -free:
> > > > -	ext4_free_io_end(io);
> > > > +
> > > > +	while (!list_empty(&to_free)) {
> > > > +		io = list_entry(to_free.next, ext4_io_end_t, list);
> > > > +		list_del_init(&io->list);
> > > > +		ext4_free_io_end(io);
> > > > +	}
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +/*
> > > > + * work on completed aio dio IO, to convert unwritten extents to extents
> > > > + */
> > > > +static void ext4_end_io_work(struct work_struct *work)
> > > > +{
> > > > +	ext4_io_end_t *io = container_of(work, ext4_io_end_t, work);
> > > > +	ext4_do_flush_completed_IO(io->inode, io);
> > > > +}
> > > > +
> > > > +int ext4_flush_completed_IO(struct inode *inode)
> > > > +{
> > > > +	return ext4_do_flush_completed_IO(inode, NULL);
> > > >  }
> > >   Also it seems that when ext4_flush_completed_IO() is called, workqueue
> > > can have several IO structures queued in its local lists thus we miss them
> > > here and don't properly wait for all conversions?
> > No it is not. Because list drained atomically, and 
> > add_complete_io will queue work only if list is empty.
> > 
> > Race between conversion and dequeue-process is not possible because
> > we hold lock during entire walk of complete_list, so from external
> > point of view we mark list as conversed(clear unwritten flag)
> > happens atomically. I've drawn all possible situations and race not
> > happen. If you know any please let me know.
>   I guess I'm missing something obvious. So let's go step by step:
> 1) ext4_flush_completed_IO() must make sure there is no outstanding
>    conversion for the inode.
> 2) Now assume we have non-empty i_completed_io_list - thus work is queued.
> 3) The following situation seems to be possible:
> 
> CPU1					CPU2
> (worker thread)				(truncate)
> ext4_end_io_work()
>   ext4_do_flush_completed_IO()
>     spin_lock_irqsave(&ei->i_completed_io_lock, flags);
>     dump_completed_IO(inode);
>     list_replace_init(&ei->i_completed_io_list, &unwritten);
>     spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> 
> 					ext4_flush_completed_IO()
> 					  ext4_do_flush_completed_IO()
> 					    - sees empty i_completed_io_list
> 					     => exits
> 
>   But we still have some conversions pending in 'unwritten' list. What am
> I missing?
Indeed, I've simply missed that case. The case which result silently
broke integrity sync ;(
Thank you for spotting this. I'll be back with updated version.
> 
> 								Honza
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html