[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EAD01C1.1060002@tao.ma>
Date: Sun, 30 Oct 2011 15:50:25 +0800
From: Tao Ma <tm@....ma>
To: Ted Ts'o <tytso@....edu>
CC: linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: Check io list state and avoid an unnecessary mutex_lock
in ext4_end_io_work.
Hi Ted,
On 10/30/2011 04:57 AM, Ted Ts'o wrote:
> On Wed, Sep 14, 2011 at 04:33:02PM +0800, Tao Ma wrote:
>> From: Tao Ma <boyu.mt@...bao.com>
>>
>> When we finish the end io work in ext4_flush_completed_IO, we take
>> the io work away from the list, but don't free it. Then in the workqueue,
>> we can check the list state and then avoid the extra work if it is also
>> done. It is good, but we check list state in ext4_end_io_nolock with i_mutex held
>> instead of the spin_lock in other places. This is wrong.
>
> Hi Tao,
>
> Thanks for finding and pointing out this bug in ext4_end_io_nolock().
> Unfortunately your proposed fix doesn't take into account that there
> are other callers of ext4_end_io_nolock() besides ext4_end_io_work(),
> and so it's not sufficient to move the test from former function to
> the latter.
sorry, but I thought I had considered this case.
There are 2 callers. One is ext4_end_io_work(which has the bug I pointed
out), the other is ext4_flush_complete_IO which has already done the
check before calling ext4_end_io_nolock. And that's the reason why I
move the check from ext4_end_io_nolock to ext4_end_io_work. So for the
ext4_flush_complete_IO case, your new patch will spin_lock twice for the
checking. Do I miss something here?
Thanks
Tao
>
> Attached please find the patch which I am planning to check into the
> ext4 tree to address this bug which you have pointed out.
>
> Regards,
>
> - Ted
>
> commit c0e36d8410bfad4db4edefeb4175f85a5d216c8d
> Author: Theodore Ts'o <tytso@....edu>
> Date: Sat Oct 29 16:57:19 2011 -0400
>
> ext4: use spinlock before checking io->list in ext4_io_end_nolock()
>
> In ext4_end_io_nolock(), io->list is checked to see if it is empty
> without taking the ei->i_completed_io_lock spinlock. This violates
> the locking protocol, and can cause very hard to debug failures.
>
> Also optimize ext4_end_io_work() so that if ext4_end_io_nolock() is
> not going to do any work, don't try getting the i_mutex and possibly
> requeuing the end_io request if the trylock doesn't succeed in grabbing
> the mutex.
>
> Thanks to Tao Ma <boyu.mt@...bao.com> for spotting the error and
> providing an initial fix to address the problem.
>
> Signed-off-by: "Theodore Ts'o" <tytso@....edu>
> Signed-off-by: Tao Ma <boyu.mt@...bao.com>
>
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 92f38ee..0af5607 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -90,21 +90,27 @@ void ext4_free_io_end(ext4_io_end_t *io)
> */
> int ext4_end_io_nolock(ext4_io_end_t *io)
> {
> - struct inode *inode = io->inode;
> - loff_t offset = io->offset;
> - ssize_t size = io->size;
> - wait_queue_head_t *wq;
> - int ret = 0;
> + unsigned long flags;
> + struct inode *inode = io->inode;
> + loff_t offset = io->offset;
> + ssize_t size = io->size;
> + struct ext4_inode_info *ei = EXT4_I(inode);
> + wait_queue_head_t *wq;
> + int ret;
>
> ext4_debug("ext4_end_io_nolock: io 0x%p from inode %lu,list->next 0x%p,"
> "list->prev 0x%p\n",
> io, inode->i_ino, io->list.next, io->list.prev);
>
> - if (list_empty(&io->list))
> - return ret;
> + spin_lock_irqsave(&ei->i_completed_io_lock, flags);
> + if (list_empty(&io->list)) {
> + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> + return 0;
> + }
> + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
>
> if (!(io->flag & EXT4_IO_END_UNWRITTEN))
> - return ret;
> + return 0;
>
> ret = ext4_convert_unwritten_extents(inode, offset, size);
> if (ret < 0) {
> @@ -142,6 +148,16 @@ static void ext4_end_io_work(struct work_struct *work)
> unsigned long flags;
> int ret;
>
> + if (!(io->flag & EXT4_IO_END_UNWRITTEN))
> + goto free_io_end;
> +
> + spin_lock_irqsave(&ei->i_completed_io_lock, flags);
> + if (list_empty(&io->list)) {
> + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> + goto free_io_end;
> + }
> + spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> +
> if (!mutex_trylock(&inode->i_mutex)) {
> /*
> * Requeue the work instead of waiting so that the work
> @@ -170,6 +186,7 @@ static void ext4_end_io_work(struct work_struct *work)
> list_del_init(&io->list);
> spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> mutex_unlock(&inode->i_mutex);
> +free_io_end:
> ext4_free_io_end(io);
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists