[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130716204335.GH11674@dastard>
Date: Wed, 17 Jul 2013 06:43:35 +1000
From: Dave Chinner <david@...morbit.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Ben Myers <bpm@....com>, Peter Zijlstra <peterz@...radead.org>,
Oleg Nesterov <oleg@...hat.com>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Dave Jones <davej@...hat.com>, xfs@....sgi.com
Subject: Re: splice vs execve lockdep trace.
On Tue, Jul 16, 2013 at 01:18:06PM -0700, Linus Torvalds wrote:
> On Tue, Jul 16, 2013 at 12:33 PM, Ben Myers <bpm@....com> wrote:
> >> >
> >> > And looking more at that, I'm actually starting to think this is an
> >> > XFS locking problem. XFS really should not call back to splice while
> >> > holding the inode lock.
>
> .. that was misleading, normally "inode lock" would be i_lock, but
> here I meant the XFS-specific i_iolock.
>
> But you clearly picked up on it:
>
> > CPU0 CPU1 CPU2
> > ---- ---- ----
> > lock(&sig->cred_guard_mutex);
> > lock(&pipe->mutex/1);
> > lock(&(&ip->io_lock)->mr_lock);
> > lock(&(&ip->io_lock)->mr_lock);
> > lock(&sig->cred_guard_mutex);
> > lock(&pipe->mutex/1);
>
> Yup.
>
> > I agree that fixing this in XFS seems to be the most reasonable plan,
> > and Dave's approach looks ok to me. Seems like patch 1 should go
> > through Al's tree, but we'll also need to commit it to the xfs tree
> > prerequisite to patch 2.
>
> Btw, is there some reason why XFS couldn't just use
> generic_file_splice_read() directly?
Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We
don't use i_mutex for many things IO related, and so internal
locking is needed to serialise against stuff like truncate, hole
punching, etc, that are run through non-vfs interfaces.
> And splice has mmap semantics - the whole point of splice is about
> moving pages around, after all - so I *think* your current
> "xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);" is actually over-serialization.
No, that's just taking the ip->i_iolock in shared mode - that's less
serialisation than holding i_mutex as it allow parallel read
operations but still locks out concurrent buffered writes to the
file (i.e. posix atomic write vs read requirements)
> The pages will be shared by the pipe buffers anyway, so splicing by
> definition doesn't imply full data serialization (because the reading
> of the data itself will happen much later).
>
> So the per-page serialization done by readpage() should already be
> sufficient, no?
>
> I dunno. Maybe there's something I'm missing. XFS does seem to wrap
> all the other generic functions in that lock too, but since mmap() etc
> clearly have to be able to get things one page at a time (without any
> wrapping at higher layers), I'm just wondering whether splice_read
> might not be able to avoid it.
Read isn't the problem - it's write that's the deadlock issue...
Cheers,
Dave.
>
> Linus
>
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists