linux-kernel - Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANN689F8hskgfJ=n+RxBbDgym4Q1PWdq7MfGHgxTRXtNJjYZFQ@mail.gmail.com>
Date:	Tue, 19 Mar 2013 16:48:30 -0700
From:	Michel Lespinasse <walken@...gle.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Peter Hurley <peter@...leysoftware.com>,
	Alex Shi <alex.shi@...el.com>, Ingo Molnar <mingo@...nel.org>,
	David Howells <dhowells@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>,
	Yuanhan Liu <yuanhan.liu@...ux.intel.com>,
	Rik van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 11/12] rwsem: wake all readers when first waiter is a reader

On Mon, Mar 18, 2013 at 6:17 PM, Dave Chinner <david@...morbit.com> wrote:
> On Wed, Mar 13, 2013 at 10:00:51PM -0400, Peter Hurley wrote:
>> On Wed, 2013-03-13 at 14:23 +1100, Dave Chinner wrote:
>> > We don't care about the ordering between multiple concurrent
>> > metadata modifications - what matters is whether the ongoing data IO
>> > around them is ordered correctly.
>>
>> Dave,
>>
>> The point that Michel is making is that there never was any ordering
>> guarantee by rwsem. It's an illusion.
>
> Weasel words.

Whoaaa, calm down.

You initially made one false statement (that the change meant a stream
of readers would starve a writer forever) and one imprecise statement
(that rwsem used to guarantee that readers don't skip ahead of writers
- this may be true in practice for your use case because the latencies
involved are very large compared to scheduling latencies, but that's a
very important qualification that needs to be added here). That
confused me enough that I initially couldn't tell what your actual
concern was, so I pointed out the source of my confusion and asked you
to clarify. It seems unfair to characterize that as "weasel words" -
I'm not trying to be a smartass here, but only to actually understand
your concern.

>> The reason is simple: to even get to the lock the cpu has to be
>> sleep-able. So for every submission that you believe is ordered, is by
>> its very nature __not ordered__, even when used by kernel code.
>>
>> Why? Because any thread on its way to claim the lock (reader or writer)
>> could be pre-empted for some other task, thus delaying the submission of
>> whatever i/o you believed to be ordered.
>
> You think I don't know this?  You're arguing fine grained, low level
> behaviour between tasks is unpredictable. I get that. I understand
> that. But I'm not arguing about fine-grained, low level, microsecond
> semantics of the locking order....
>
> What you (and Michael) appear to be failing to see is what happens
> on a macro level when you have read locks being held for periods
> measured in *seconds* (e.g. direct IO gets queued behind a few
> thousand other IOs in the elevator waiting for a request slot),
> and the subsequent effect of inserting an operation that requires a
> write lock into that IO stream.
>
> IOWs, it simply doesn't matter if there's a micro-level race between
> the write lock and a couple of the readers. That's the level you
> guys are arguing at but it simply does not matter in the cases I'm
> describing. I'm talking about high level serialisation behaviours
> that might take of *seconds* to play out and the ordering behaviours
> observed at that scale.
>
> That is, I don't care if a couple of threads out of a few thousand
> race with the write lock over few tens to hundreds of microseconds,
> but I most definitely care if a few thousand IOs issued seconds
> after the write lock is queued jump over the write lock. That is a
> gross behavioural change at the macro-level.....

Understood. I accepted your concern and made sure my v2 proposal
doesn't do such macro level reordering.

>> So just to reiterate: there is no 'queue' and no 'barrier'. The
>> guarantees that rwsem makes are;
>> 1. Multiple readers can own the lock.
>> 2. Only a single writer can own the lock.
>> 3. Readers will not starve writers.
>
> You've conveniently ignored the fact that the current implementation
> also provides following guarantee:
>
> 4. new readers will block behind existing writers

In your use case, with large enough queue latencies, yes.

Please don't make it sound like this applies in every use case - it
has never applied for short (<ms) queue latencies, and you might
confuse people by making such unqualified statements.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/