[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18014.16756.192137.738569@notabene.brown>
Date: Thu, 31 May 2007 13:31:00 +1000
From: Neil Brown <neilb@...e.de>
To: Nikita Danilov <nikita@...sterfs.com>
Cc: device-mapper development <dm-devel@...hat.com>,
linux-fsdevel@...r.kernel.org, linux-raid@...r.kernel.org,
David Chinner <dgc@....com>, linux-kernel@...r.kernel.org,
Jens Axboe <jens.axboe@...cle.com>
Subject: Re: [RFD] BIO_RW_BARRIER - what it means for devices,
filesystems, and dm/md.
On Monday May 28, nikita@...sterfs.com wrote:
> Neil Brown writes:
> >
>
> [...]
>
> > Thus the general sequence might be:
> >
> > a/ issue all "preceding writes".
> > b/ issue the commit write with BIO_RW_BARRIER
> > c/ wait for the commit to complete.
> > If it was successful - done.
> > If it failed other than with EOPNOTSUPP, abort
> > else continue
> > d/ wait for all 'preceding writes' to complete
> > e/ call blkdev_issue_flush
> > f/ issue commit write without BIO_RW_BARRIER
> > g/ wait for commit write to complete
> > if it failed, abort
> > h/ call blkdev_issue
> > DONE
> >
> > steps b and c can be left out if it is known that the device does not
> > support barriers. The only way to discover this to try and see if it
> > fails.
> >
> > I don't think any filesystem follows all these steps.
>
> It seems that steps b/ -- h/ are quite generic, and can be implemented
> once in a generic code (with some synchronization mechanism like
> wait-queue at d/).
Yes and no.
It depends on what you mean by "preceding write".
If you implement this in the filesystem, the filesystem can wait only
for those writes where it has an ordering dependency. If you
implement it in common code, then you have to wait for all writes
that were previously issued.
e.g.
If you have two different filesystems on two different partitions on
the one device, why should writes in one filesystem wait for a
barrier issued in the other filesystem.
If you have a single filesystem with one thread doing lot of
over-writes (no metadata changes) and the another doing lots of
metadata changes (requiring journalling and barriers) why should the
data write be held up by the metadata updates?
So I'm not actually convinced that doing this is common code is the
best approach. But it is the easiest. The common code should provide
the barrier and flushing primitives, and the filesystem gets to use
them however it likes.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists