[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150810224154.GK3902@dastard>
Date: Tue, 11 Aug 2015 08:41:54 +1000
From: Dave Chinner <david@...morbit.com>
To: Jan Kara <jack@...e.cz>
Cc: Oleg Nesterov <oleg@...hat.com>, Al Viro <viro@...iv.linux.org.uk>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore
On Mon, Aug 10, 2015 at 04:59:42PM +0200, Jan Kara wrote:
> On Fri 07-08-15 21:55:52, Oleg Nesterov wrote:
> > I'll try to re-check/re-test, but so far I think that this and the
> > previous series are correct. However 3/4 from the previous series
> > (attached at the end just in case) uncovers (I think) some problems
> > in xfs locking.
....
>
> Hum, I had a look at the lockdep report below and it's one of the
> peculiarities of the freeze protection. For record let me repeat the full
> argument for why I don't think there's a possibility for a real deadlock.
> Feel free to skip to the end if you get bored.
>
> One would like to construct the lock chain as:
>
> CPU0 (chown foo dir) CPU1 (readdir dir) CPU2 (page fault)
> process Y process X, thread 0 process X, thread 1
>
> get ILOCK for dir
> gets freeze protection
> starts transaction in xfs_setattr_nonsize
> waits to get ILOCK on 'dir'
> get mmap_sem for X
> wait for mmap_sem for process X
> in filldir()
> wait for freeze protection in
> xfs_page_mkwrite
>
> and CPU3 then being in freeze_super() blocking CPU2 and waiting for CPU0 to
> finish it's freeze-protected section. But this cannot happen. The reason is
> that we block writers level-by-level and thus while there are writers at
> level X, we do not block writers at level X+1. So in this particular case
> freeze_super() will block waiting for CPU0 to finish its freeze protected
> section while CPU2 is free to continue.
>
> In general we have a chain like
>
> freeze L0 -> freeze L1 -> freeze L2 -> ILOCK -> mmap_sem --\
> A |
> \------------------------------------------/
>
> But since ILOCK is always acquired with freeze protection at L0 and we can
> block at L1 only after there are no writers at L0, this loop can never
> happen.
>
> Note that if we use the property of freezing that lock at level X+1 cannot
> block when we hold lock at level X, we can as well simplify the dependency
> graph and track in it only the lowest level of freeze lock that is
> currently acquired (since the levels above it cannot block and do not in
> any way influence blocking of other processes either and thus are
> irrelevant for the purpose of deadlock detection). Then the dependency
> graph we'd get would be:
>
> freeze L0 -> ILOCK -> mmap_sem -> freeze L1
>
> and we have a nice acyclic graph we like to see... So probably we have to
> hack the lockdep instrumentation some more and just don't tell lockdep
> about freeze locks at higher levels if we already hold a lock at lower
> level. Thoughts?
The XFS directory ilock->filldir->might_fault locking path has been
generating false positives in quite a lot of places because of
things we do on one side of the mmap_sem in filesystem paths vs
thigs we do on the other side of the mmap_sem in the page fault
path.
I'm getting sick of these stupid lockdep false positives. I think I
need to rework the XFS readdir locking patch I wrote a while back:
http://oss.sgi.com/archives/xfs/2014-03/msg00146.html
At the time it wasn't clear what the best way forward was. Right now
I think the method I originally used (IOLOCK for directory data and
hence readdir, ILOCK for everything else) will be sufficient; if we
need to do anything smarter that can be addressed further down the
road.
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists