linux-kernel - Re: [PATCH] bcache: Remove use of down/up_read_non

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20130821043821.6e1224b4@gandalf.local.home>
Date:	Wed, 21 Aug 2013 04:38:21 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Kent Overstreet <kmo@...erainc.com>
Cc:	LKML <linux-kernel@...r.kernel.org>, linux-bcache@...r.kernel.org,
	dm-devel@...hat.com, Christoph Hellwig <hch@...radead.org>,
	David Howells <dhowells@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	akpm@...ux-foundation.org
Subject: Re: [PATCH] bcache: Remove use of down/up_read_non_owner()

On Tue, 20 Aug 2013 20:36:58 -0700
Kent Overstreet <kmo@...erainc.com> wrote:


> I think we're mostly quibbling over semantics here, and it's not that
> big a deal - that said, I don't really get your distinction between a
> semaphore and a mutex; I'd distinguish them by saying a semaphore can
> count an arbitrary number of resources (the number it's initialized to),
> and a mutex is always initialized to one - the fact that in the kernel
> mutexes must be released by the same process that took them (while
> important for PI and debugging) is not fundamental.

Understood. As, I usually am focused on PI and debugging, it may be
more fundamental to me than others.

> 
> "rw semaphore" never made any sense to me; they're not counting
> anything, like regular semaphores. They're just sleepable rw locks.

Me either.

> 
> So it _sounds_ like you're saying bcache is using it as a semaphore,
> beacuse it's using it as a lock we don't release in the original
> context? in analogy to linux mutexes and semaphores? Not quite sure what
> you mean.

Actually, I'm thinking you are using it more as a completion than a
semaphore.


> > I have a hack that actually implements a work around for non_owner, but
> > it adds overhead to all locks to do so.
> 
> Ok - I'd just be curious why the PI bit can't be factored out into a
> wrapper like how the debug stuff is handled with the _non_owner bits,
> but I can certainly believe that.

The problem is that all sleeping locks go through rt_mutex. The locks
are replaced with locks that incorporate PI.


> To simplify the algorithm just a bit (only leaving out some
> optimizations), bcache is using it to protect a rb tree, which containts
> "things that are undergoing background writeback".

I think this is what makes bcache unique in the kernel, and I don't
think there's open coded locks elsewhere.

> 
> For foreground writes, we've got to check if the write overlaps with
> anything in this rb tree. If so, we force _this_ write to writeback;
> background writeback will write stale data to the backing device, but
> that's fine because the data in the cache will still be marked as dirty.
> 
> To add stuff to this rb tree - i.e. for background writeback to start
> flushing some dirty data - it's got to make sure it's not going to
> overwrite some in progress foreground writethrough write.
> 
> Tracking every outstanding foreground write would be expensive, so
> foreground writes take a read lock on the rb tree (both to check it for
> anything they overlap with, and for their duration), and background
> writeback takes a write lock when it goes to scan for dirty data to add
> to the rb tree.
> 
> Even if you complain it's not _just_ protecting the rb tree, we're still
> using it for mutual exclusion, plain and simple, and that's all a lock
> is.

Well, rwlocks and rwsems are not mutually exclusive ;-) The read side
has multiple accessors. A mutual exclusion also would be a single task
needing exclusive access to a resource. But as things are done in the
background, that is not the case. But we are arguing semantics here and
not helping with a solution.

> 
> 
> > > Also, nack this patch because increasing the number of atomic ops to
> > > shared cachelines in our fast path. If it does end up being open coded,
> > > I'll make a more efficient version.
> > 
> > Perhaps I can whip up a "struct rwsem_non_owner" and have a very
> > limited API for that, and then you can use that.
> 
> That would be perfect :)

OK, I'll see what I can do.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/