[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20130821043821.6e1224b4@gandalf.local.home>
Date: Wed, 21 Aug 2013 04:38:21 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Kent Overstreet <kmo@...erainc.com>
Cc: LKML <linux-kernel@...r.kernel.org>, linux-bcache@...r.kernel.org,
dm-devel@...hat.com, Christoph Hellwig <hch@...radead.org>,
David Howells <dhowells@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
akpm@...ux-foundation.org
Subject: Re: [PATCH] bcache: Remove use of down/up_read_non_owner()
On Tue, 20 Aug 2013 20:36:58 -0700
Kent Overstreet <kmo@...erainc.com> wrote:
> I think we're mostly quibbling over semantics here, and it's not that
> big a deal - that said, I don't really get your distinction between a
> semaphore and a mutex; I'd distinguish them by saying a semaphore can
> count an arbitrary number of resources (the number it's initialized to),
> and a mutex is always initialized to one - the fact that in the kernel
> mutexes must be released by the same process that took them (while
> important for PI and debugging) is not fundamental.
Understood. As, I usually am focused on PI and debugging, it may be
more fundamental to me than others.
>
> "rw semaphore" never made any sense to me; they're not counting
> anything, like regular semaphores. They're just sleepable rw locks.
Me either.
>
> So it _sounds_ like you're saying bcache is using it as a semaphore,
> beacuse it's using it as a lock we don't release in the original
> context? in analogy to linux mutexes and semaphores? Not quite sure what
> you mean.
Actually, I'm thinking you are using it more as a completion than a
semaphore.
> > I have a hack that actually implements a work around for non_owner, but
> > it adds overhead to all locks to do so.
>
> Ok - I'd just be curious why the PI bit can't be factored out into a
> wrapper like how the debug stuff is handled with the _non_owner bits,
> but I can certainly believe that.
The problem is that all sleeping locks go through rt_mutex. The locks
are replaced with locks that incorporate PI.
> To simplify the algorithm just a bit (only leaving out some
> optimizations), bcache is using it to protect a rb tree, which containts
> "things that are undergoing background writeback".
I think this is what makes bcache unique in the kernel, and I don't
think there's open coded locks elsewhere.
>
> For foreground writes, we've got to check if the write overlaps with
> anything in this rb tree. If so, we force _this_ write to writeback;
> background writeback will write stale data to the backing device, but
> that's fine because the data in the cache will still be marked as dirty.
>
> To add stuff to this rb tree - i.e. for background writeback to start
> flushing some dirty data - it's got to make sure it's not going to
> overwrite some in progress foreground writethrough write.
>
> Tracking every outstanding foreground write would be expensive, so
> foreground writes take a read lock on the rb tree (both to check it for
> anything they overlap with, and for their duration), and background
> writeback takes a write lock when it goes to scan for dirty data to add
> to the rb tree.
>
> Even if you complain it's not _just_ protecting the rb tree, we're still
> using it for mutual exclusion, plain and simple, and that's all a lock
> is.
Well, rwlocks and rwsems are not mutually exclusive ;-) The read side
has multiple accessors. A mutual exclusion also would be a single task
needing exclusive access to a resource. But as things are done in the
background, that is not the case. But we are arguing semantics here and
not helping with a solution.
>
>
> > > Also, nack this patch because increasing the number of atomic ops to
> > > shared cachelines in our fast path. If it does end up being open coded,
> > > I'll make a more efficient version.
> >
> > Perhaps I can whip up a "struct rwsem_non_owner" and have a very
> > limited API for that, and then you can use that.
>
> That would be perfect :)
OK, I'll see what I can do.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists