linux-kernel - Re: [PATCH RFC tip/core/rcu] SRCU rewrite

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1189271890.5446.1479396692585.JavaMail.zimbra@efficios.com>
Date:   Thu, 17 Nov 2016 15:31:32 +0000 (UTC)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     Boqun Feng <boqun.feng@...il.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>, dipankar <dipankar@...ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Josh Triplett <josh@...htriplett.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        rostedt <rostedt@...dmis.org>,
        David Howells <dhowells@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>,
        dvhart <dvhart@...ux.intel.com>, fweisbec <fweisbec@...il.com>,
        Oleg Nesterov <oleg@...hat.com>,
        bobby prani <bobby.prani@...il.com>, ldr709 <ldr709@...il.com>
Subject: Re: [PATCH RFC tip/core/rcu] SRCU rewrite

----- On Nov 17, 2016, at 10:07 AM, Lai Jiangshan jiangshanlai@...il.com wrote:

> On Thu, Nov 17, 2016 at 10:31 PM, Boqun Feng <boqun.feng@...il.com> wrote:
>> On Thu, Nov 17, 2016 at 08:18:51PM +0800, Lai Jiangshan wrote:
>>> On Tue, Nov 15, 2016 at 10:37 PM, Paul E. McKenney
>>> <paulmck@...ux.vnet.ibm.com> wrote:
>>> > On Tue, Nov 15, 2016 at 09:44:45AM +0800, Boqun Feng wrote:
>>>
>>> >>
>>> >> __srcu_read_lock() used to be called with preemption disabled. I guess
>>> >> the reason was because we have two percpu variables to increase. So with
>>> >> only one percpu right, could we remove the preempt_{dis,en}able() in
>>> >> srcu_read_lock() and use this_cpu_inc() here?
>>> >
>>> > Quite possibly...
>>> >
>>>
>>
>> Hello, Lai ;-)
>>
>>> it will be nicer if it is removed.
>>>
>>> The reason for the preemption-disabled was also because we
>>> have to disallow any preemption between the fetching of the idx
>>> and the increasement. so that we have at most NR_CPUS worth
>>> of readers using the old index that haven't incremented the counters.
>>>
>>
>> After reading the comment for a while, I actually got a question, maybe
>> I miss something ;-)
>>
>> Why "at most NR_CPUS worth of readers using the old index haven't
>> incremented the counters" could save us from overflow the counter?
>>
>> Please consider the following case in current implementation:
>>
>>
>> {sp->completed = 0} so idx = 1 in srcu_advance_batches(...)
>>
>> one thread A is currently in __srcu_read_lock() and using idx = 1 and
>> about to increase the percpu c[idx], and ULONG_MAX __srcu_read_lock()s
>> have been called and returned with idx = 1, please note I think this is
>> possible because I assume we may have some code like this:
>>
>>         unsigned long i = 0;
>>         for (; i < ULONG_MAX; i++)
>>                 srcu_read_lock(); // return the same idx 1;
> 
> this is the wrong usage of the api.
> 
> 
> you might rewrite it as:
> 
>        unsigned long index[2] = {0, 0};
>        unsigned long i = 0;
>        for (;  index[1] < ULONG_MAX; i++)
>                 index[srcu_read_lock()]++;
> 
> 
> I think we should add document to disallow this kind of usage.
> a reader should eat 4bytes on the memory at least.
> 

(the analysis below refers to the rewritten SRCU scheme)

Let's presume we use the API correctly, as you describe (saving
the returned index of srcu_read_lock() somewhere).

So for the sake of argument, we can either call srcu_read_lock
in a loop (during which we can be migrated), or call it
concurrently from various threads. The effect in terms of
overflow is pretty much the same.

What is done here is incrementing per-cpu split-counters. In
the worse-case scenario, let's assume we're incrementing those
counters for a single index (0 or 1).

If we think about this system-wide, we don't really care about
the overflow of a single CPU counter, because what matters is
the difference between the overall nr_lock - nr_unlock counts
for a given index, once summed up by synchronize_srcu().

So the only situation that could lead to an overflow that matters
is if synchronize_srcu() see ULONG_MAX more increments of nr_lock
than the observed number of nr_unlock increments.

So the bound is not only about the number of concurrent SRCU
readers, but also about the number of SRCU readers that may
appear between the moment synchronize_srcu() reads the nr_unlock
per-cpu counters and the moment it reads the nr_lock counters.

This maximum bound of ULONG_MAX - 1 therefore applies to the
sum of:
- numner of concurrent SRCU read-side critical sections active
  at the same time,
- number of SRCU read-side critical sections beginning after
  synchronize_srcu() has read the nr_unlock counters, before
  it reads the nr_lock counters.

You guys seem to see cases that would require a lower max nr
reader bound, but I'm afraid I don't quite understand them.

Thanks,

Mathieu


>>
>> And none of the corresponding srcu_read_unlock() has been called;
>>
>> In this case, at the time thread A increases the percpu c[idx], that
>> will result in an overflow, right? So even one reader using old idx will
>> result in overflow.
>>
>>
>> I think we won't be hit by overflow is not because we have few readers
>> using old idx, it's because there are unlikely ULONG_MAX + 1
>> __srcu_read_lock() called for the same idx, right? And the reason of
>> this is much complex: because we won't have a fair mount of threads in
>> the system, because no thread will nest srcu many levels, because there
>> won't be a lot readers using old idx.
>>
>> And this will still be true if we use new mechanism and shrink the
>> preemption disabled section, right?
>>
>> Regards,
>> Boqun
>>
>>> if we remove the preempt_{dis,en}able(). we must change the
>>> "NR_CPUS" in the comment into ULONG_MAX/4. (I assume
>>> one on-going reader needs at least need 4bytes at the stack). it is still safe.
>>>
>>> but we still need to think more if we want to remove the preempt_{dis,en}able().
>>>
>>> Thanks
> >> Lai

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com