[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080626124911.GA19285@infradead.org>
Date: Thu, 26 Jun 2008 08:49:11 -0400
From: Christoph Hellwig <hch@...radead.org>
To: Matthew Wilcox <matthew@....cx>
Cc: xfs@....sgi.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/6] Extend completions to provide XFS object flush
requirements
On Thu, Jun 26, 2008 at 06:40:09AM -0600, Matthew Wilcox wrote:
> On Thu, Jun 26, 2008 at 10:21:12PM +1000, Dave Chinner wrote:
> > On Thu, Jun 26, 2008 at 05:42:42AM -0600, Matthew Wilcox wrote:
> > > Then let's leave it as a semaphore. You can get rid of the sema_t if
> > > you like, but I don't think that turning completions into semaphores is
> > > a good idea (because it's confusing).
> >
> > So remind me what the point of the semaphore removal tree is again?
>
> To remove the semaphores which don't need to be semaphores any more.
>
> > As Christoph suggested, I can put this under another API that
> > is implemented using completions. If I have to do that in XFS,
> > so be it....
>
> You could, yes. But you could just use completions directly ...
>
> > The main reason for this that we've just uncovered the fact that the
> > way XFS uses semaphores is completely unsafe [*] on x86/x86_64 for
> > kernels prior to the new generic semaphores.
> >
> > [*] 2.6.20 panics in up() because of this race when I/O completion
> > (the up call) races with a simultaneous down() (iowaiter):
> >
> > T1 T2
> > up() down()
> > kmem_free()
> >
> > When the down() call completes, the up() call can still be
> > referencing the semaphore, and hence if we free the structure after
> > the down call then the up() will reference freed memory. This is
> > probably the cause of many unexplained log replay or unmount panics
> > that we've been hitting for years with buffers that been freed while
> > apparently still in use....
>
> This is exactly the kind of thing completions were supposed to be used
> for. T1 should be calling complete() and T2 should be calling
> wait_for_completion().
Please read Dave's introductionary mail. What XFS wants if completions
with a little bit extra, so he implemented the little bit extra. This
little bit extra is pretty well described in the mail starting this
thread.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists