linux-ext4 - Re: DIO process stuck apparently due to dioread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110817090415.GA9959@quack.suse.cz>
Date:	Wed, 17 Aug 2011 11:04:15 +0200
From:	Jan Kara <jack@...e.cz>
To:	Jiaying Zhang <jiayingz@...gle.com>
Cc:	Dave Chinner <david@...morbit.com>, Tao Ma <tm@....ma>,
	Jan Kara <jack@...e.cz>, Michael Tokarev <mjt@....msk.ru>,
	linux-ext4@...r.kernel.org, sandeen@...hat.com
Subject: Re: DIO process stuck apparently due to dioread_nolock (3.0)

On Tue 16-08-11 17:08:42, Jiaying Zhang wrote:
> On Tue, Aug 16, 2011 at 4:59 PM, Dave Chinner <david@...morbit.com> wrote:
> > On Tue, Aug 16, 2011 at 02:32:12PM -0700, Jiaying Zhang wrote:
> >> On Tue, Aug 16, 2011 at 8:03 AM, Tao Ma <tm@....ma> wrote:
> >> > On 08/16/2011 09:53 PM, Jan Kara wrote:
> >> I wonder whether the following patch will solve the problem:
> >>
> >> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
> >> index 6c27111..ca90d73 100644
> >> --- a/fs/ext4/indirect.c
> >> +++ b/fs/ext4/indirect.c
> >> @@ -800,12 +800,17 @@ ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
> >>         }
> >>
> >>  retry:
> >> -       if (rw == READ && ext4_should_dioread_nolock(inode))
> >> +       if (rw == READ && ext4_should_dioread_nolock(inode)) {
> >> +               if (unlikely(!list_empty(&ei->i_completed_io_list))) {
> >> +                       mutex_lock(&inode->i_mutex);
> >> +                       ext4_flush_completed_IO(inode);
> >> +                       mutex_unlock(&inode->i_mutex);
> >> +               }
> >>                 ret = __blockdev_direct_IO(rw, iocb, inode,
> >>                                  inode->i_sb->s_bdev, iov,
> >>                                  offset, nr_segs,
> >>                                  ext4_get_block, NULL, NULL, 0);
> >> -       else {
> >> +       } else {
> >>                 ret = blockdev_direct_IO(rw, iocb, inode,
> >>                                  inode->i_sb->s_bdev, iov,
> >>                                  offset, nr_segs,
> >>
> >> I tested the patch a little bit and it seems to resolve the race
> >> on dioread_nolock in my case. Michael, I would very appreciate
> >> if you can try this patch with your test case and see whether it works.
> >
> > Just my 2c worth here: this is a data corruption bug so the root
> > cause neeeds to be fixed. The above patch does not address the root
> > cause.
> >
> >> > You are absolutely right. The really problem is that ext4_direct_IO
> >> > begins to work *after* we clear the page writeback flag and *before* we
> >> > convert unwritten extent to a valid state. Some of my trace does show
> >> > that. I am working on it now.
> >
> > And that's the root cause - think about what that means for a
> > minute.  It means that extent conversion can race with anything that
> > requires IO to complete first. e.g. truncate or fsync.  It can then
> > race with other subsequent operations, which can have even nastier
> > effects. IOWs, there is a data-corruption landmine just sitting
> > there waiting for the next person to trip over it.
> You are right that extent conversion can race with truncate and fsync
> as well. That is why we already need to call ext4_flush_completed_IO()
> in those places as well. I agree this is a little nasty and there can be
> some other corner cases that we haven't covered.
  Exactly. I agree with Dave here that it is asking for serious trouble to
clear PageWriteback bit before really completing the IO. 

> The problem is we can not do extent conversion during the end_io time. I
> haven't thought of a better approach to deal with these races. I am
> curious how xfs deals with this problem.
  Well, XFS cannot do extent conversion in end_io for AIO+DIO either. So it
clears PageWriteback bit only after extent conversion has happened in the
worker thread. ext4 has problems with this (deadlocks) because of unlucky
locking of extent tree using i_mutex. So I believe we have to find a better
locking for extent tree so that ext4 can clear PageWriteback bit from the
worker thread...

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html