[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1091181941.5761.1479400394025.JavaMail.zimbra@efficios.com>
Date: Thu, 17 Nov 2016 16:33:14 +0000 (UTC)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Lai Jiangshan <jiangshanlai@...il.com>,
Boqun Feng <boqun.feng@...il.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>, dipankar <dipankar@...ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Josh Triplett <josh@...htriplett.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
rostedt <rostedt@...dmis.org>,
David Howells <dhowells@...hat.com>,
Eric Dumazet <edumazet@...gle.com>,
dvhart <dvhart@...ux.intel.com>, fweisbec <fweisbec@...il.com>,
Oleg Nesterov <oleg@...hat.com>,
bobby prani <bobby.prani@...il.com>, ldr709 <ldr709@...il.com>
Subject: Re: [PATCH RFC tip/core/rcu] SRCU rewrite
----- On Nov 17, 2016, at 10:53 AM, Paul E. McKenney paulmck@...ux.vnet.ibm.com wrote:
> On Thu, Nov 17, 2016 at 03:38:08PM +0000, Mathieu Desnoyers wrote:
>> ----- On Nov 17, 2016, at 10:31 AM, Mathieu Desnoyers
>> mathieu.desnoyers@...icios.com wrote:
>>
>> > ----- On Nov 17, 2016, at 10:07 AM, Lai Jiangshan jiangshanlai@...il.com wrote:
>> >
>> >> On Thu, Nov 17, 2016 at 10:31 PM, Boqun Feng <boqun.feng@...il.com> wrote:
>> >>> On Thu, Nov 17, 2016 at 08:18:51PM +0800, Lai Jiangshan wrote:
>> >>>> On Tue, Nov 15, 2016 at 10:37 PM, Paul E. McKenney
>> >>>> <paulmck@...ux.vnet.ibm.com> wrote:
>> >>>> > On Tue, Nov 15, 2016 at 09:44:45AM +0800, Boqun Feng wrote:
>> >>>>
>> >>>> >>
>> >>>> >> __srcu_read_lock() used to be called with preemption disabled. I guess
>> >>>> >> the reason was because we have two percpu variables to increase. So with
>> >>>> >> only one percpu right, could we remove the preempt_{dis,en}able() in
>> >>>> >> srcu_read_lock() and use this_cpu_inc() here?
>> >>>> >
>> >>>> > Quite possibly...
>> >>>> >
>> >>>>
>> >>>
>> >>> Hello, Lai ;-)
>> >>>
>> >>>> it will be nicer if it is removed.
>> >>>>
>> >>>> The reason for the preemption-disabled was also because we
>> >>>> have to disallow any preemption between the fetching of the idx
>> >>>> and the increasement. so that we have at most NR_CPUS worth
>> >>>> of readers using the old index that haven't incremented the counters.
>> >>>>
>> >>>
>> >>> After reading the comment for a while, I actually got a question, maybe
>> >>> I miss something ;-)
>> >>>
>> >>> Why "at most NR_CPUS worth of readers using the old index haven't
>> >>> incremented the counters" could save us from overflow the counter?
>> >>>
>> >>> Please consider the following case in current implementation:
>> >>>
>> >>>
>> >>> {sp->completed = 0} so idx = 1 in srcu_advance_batches(...)
>> >>>
>> >>> one thread A is currently in __srcu_read_lock() and using idx = 1 and
>> >>> about to increase the percpu c[idx], and ULONG_MAX __srcu_read_lock()s
>> >>> have been called and returned with idx = 1, please note I think this is
>> >>> possible because I assume we may have some code like this:
>> >>>
>> >>> unsigned long i = 0;
>> >>> for (; i < ULONG_MAX; i++)
>> >>> srcu_read_lock(); // return the same idx 1;
>> >>
>> >> this is the wrong usage of the api.
>> >>
>> >>
>> >> you might rewrite it as:
>> >>
>> >> unsigned long index[2] = {0, 0};
>> >> unsigned long i = 0;
>> >> for (; index[1] < ULONG_MAX; i++)
>> >> index[srcu_read_lock()]++;
>> >>
>> >>
>> >> I think we should add document to disallow this kind of usage.
>> >> a reader should eat 4bytes on the memory at least.
>> >>
>> >
>> > (the analysis below refers to the rewritten SRCU scheme)
>> >
>> > Let's presume we use the API correctly, as you describe (saving
>> > the returned index of srcu_read_lock() somewhere).
>> >
>> > So for the sake of argument, we can either call srcu_read_lock
>> > in a loop (during which we can be migrated), or call it
>> > concurrently from various threads. The effect in terms of
>> > overflow is pretty much the same.
>> >
>> > What is done here is incrementing per-cpu split-counters. In
>> > the worse-case scenario, let's assume we're incrementing those
>> > counters for a single index (0 or 1).
>> >
>> > If we think about this system-wide, we don't really care about
>> > the overflow of a single CPU counter, because what matters is
>> > the difference between the overall nr_lock - nr_unlock counts
>> > for a given index, once summed up by synchronize_srcu().
>> >
>> > So the only situation that could lead to an overflow that matters
>> > is if synchronize_srcu() see ULONG_MAX more increments of nr_lock
>> > than the observed number of nr_unlock increments.
>> >
>> > So the bound is not only about the number of concurrent SRCU
>> > readers, but also about the number of SRCU readers that may
>> > appear between the moment synchronize_srcu() reads the nr_unlock
>> > per-cpu counters and the moment it reads the nr_lock counters.
>> >
>> > This maximum bound of ULONG_MAX - 1 therefore applies to the
>> > sum of:
>> > - numner of concurrent SRCU read-side critical sections active
>> > at the same time,
>> > - number of SRCU read-side critical sections beginning after
>> > synchronize_srcu() has read the nr_unlock counters, before
>> > it reads the nr_lock counters.
>>
>> Now that I think of it, since we flip the period before summing
>> the nr_unlock counter, we cannot have any newcoming readers appearing
>> within the target period while we execute synchronize_srcu().
>> So it ends up being a limit on the number of concurrent SRCU
>> read-side c.s. active at the same time. (you can scratch the
>> second bullet above).
>
> We can have NR_CPUS worth of them -- those that have fetched the
> index, but not yet incremented their counter.
Ah, yes, due to preemption being disabled across those operations.
More precisely, it would be NR_CPUS - 1, because synchronize_srcu()
needs to run somewhere ;-)
>
> But if the updater fails to see their counter increment, then
> their next srcu_read_lock() is guaranteed to see the new index.
Got it, thanks!
Mathieu
>
> Thanx, Paul
>
>> Thanks,
>>
>> Mathieu
>>
>>
>>
>> > You guys seem to see cases that would require a lower max nr
>> > reader bound, but I'm afraid I don't quite understand them.
>> >
>> > Thanks,
>> >
>> > Mathieu
>> >
>> >
>> >>>
>> >>> And none of the corresponding srcu_read_unlock() has been called;
>> >>>
>> >>> In this case, at the time thread A increases the percpu c[idx], that
>> >>> will result in an overflow, right? So even one reader using old idx will
>> >>> result in overflow.
>> >>>
>> >>>
>> >>> I think we won't be hit by overflow is not because we have few readers
>> >>> using old idx, it's because there are unlikely ULONG_MAX + 1
>> >>> __srcu_read_lock() called for the same idx, right? And the reason of
>> >>> this is much complex: because we won't have a fair mount of threads in
>> >>> the system, because no thread will nest srcu many levels, because there
>> >>> won't be a lot readers using old idx.
>> >>>
>> >>> And this will still be true if we use new mechanism and shrink the
>> >>> preemption disabled section, right?
>> >>>
>> >>> Regards,
>> >>> Boqun
>> >>>
>> >>>> if we remove the preempt_{dis,en}able(). we must change the
>> >>>> "NR_CPUS" in the comment into ULONG_MAX/4. (I assume
>> >>>> one on-going reader needs at least need 4bytes at the stack). it is still safe.
>> >>>>
>> >>>> but we still need to think more if we want to remove the preempt_{dis,en}able().
>> >>>>
>> >>>> Thanks
>> >> >> Lai
>> >
>> > --
>> > Mathieu Desnoyers
>> > EfficiOS Inc.
>> > http://www.efficios.com
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists