linux-kernel - Re: [PATCH 2/2 RFC] srcu: implement Peter's checking algorithm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACvQF53uoJ4r=hbgv3wFQWB-9=btDiOJZV3YyNNHHSPjpeGp0A@mail.gmail.com>
Date:	Sat, 10 Mar 2012 11:41:00 +0800
From:	Lai Jiangshan <eag0628@...il.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	Lai Jiangshan <laijs@...fujitsu.com>, linux-kernel@...r.kernel.org,
	mingo@...e.hu, dipankar@...ibm.com, akpm@...ux-foundation.org,
	mathieu.desnoyers@...ymtl.ca, josh@...htriplett.org,
	niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org,
	rostedt@...dmis.org, Valdis.Kletnieks@...edu, dhowells@...hat.com,
	eric.dumazet@...il.com, darren@...art.com, fweisbec@...il.com,
	patches@...aro.org
Subject: Re: [PATCH 2/2 RFC] srcu: implement Peter's checking algorithm

On Thu, Mar 1, 2012 at 9:20 PM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
> On Thu, Mar 01, 2012 at 10:31:22AM +0800, Lai Jiangshan wrote:
>> On 02/29/2012 09:55 PM, Paul E. McKenney wrote:
>> > On Wed, Feb 29, 2012 at 06:07:32PM +0800, Lai Jiangshan wrote:
>> >> On 02/28/2012 09:47 PM, Paul E. McKenney wrote:
>> >>> On Tue, Feb 28, 2012 at 09:51:22AM +0800, Lai Jiangshan wrote:
>> >>>> On 02/28/2012 02:30 AM, Paul E. McKenney wrote:
>> >>>>> On Mon, Feb 27, 2012 at 04:01:04PM +0800, Lai Jiangshan wrote:
>> >>>>>> >From 40724998e2d121c2b5a5bd75114625cfd9d4f9a9 Mon Sep 17 00:00:00 2001
>> >>>>>> From: Lai Jiangshan <laijs@...fujitsu.com>
>> >>>>>> Date: Mon, 27 Feb 2012 14:22:47 +0800
>> >>>>>> Subject: [PATCH 2/2] srcu: implement Peter's checking algorithm
>> >>>>>>
>> >>>>>> This patch implement the algorithm as Peter's:
>> >>>>>> https://lkml.org/lkml/2012/2/1/119
>> >>>>>>
>> >>>>>> o      Make the checking lock-free and we can perform parallel checking,
>> >>>>>>        Although almost parallel checking makes no sense, but we need it
>> >>>>>>        when 1) the original checking task is preempted for long, 2)
>> >>>>>>        sychronize_srcu_expedited(), 3) avoid lock(see next)
>> >>>>>>
>> >>>>>> o      Since it is lock-free, we save a mutex in state machine for
>> >>>>>>        call_srcu().
>> >>>>>>
>> >>>>>> o      Remove the SRCU_REF_MASK and remove the coupling with the flipping.
>> >>>>>>        (so we can remove the preempt_disable() in future, but use
>> >>>>>>         __this_cpu_inc() instead.)
>> >>>>>>
>> >>>>>> o      reduce a smp_mb(), simplify the comments and make the smp_mb() pairs
>> >>>>>>        more intuitive.
>> >>>>>
>> >>>>> Hello, Lai,
>> >>>>>
>> >>>>> Interesting approach!
>> >>>>>
>> >>>>> What happens given the following sequence of events?
>> >>>>>
>> >>>>> o       CPU 0 in srcu_readers_active_idx_check() invokes
>> >>>>>         srcu_readers_seq_idx(), getting some number back.
>> >>>>>
>> >>>>> o       CPU 0 invokes srcu_readers_active_idx(), summing the
>> >>>>>         ->c[] array up through CPU 3.
>> >>>>>
>> >>>>> o       CPU 1 invokes __srcu_read_lock(), and increments its counter
>> >>>>>         but not yet its ->seq[] element.
>> >>>>
>> >>>>
>> >>>> Any __srcu_read_lock() whose increment of active counter is not seen
>> >>>> by srcu_readers_active_idx() is considerred as
>> >>>> "reader-started-after-this-srcu_readers_active_idx_check()",
>> >>>> We don't need to wait.
>> >>>>
>> >>>> As you said, this srcu C.S 's increment seq is not seen by above
>> >>>> srcu_readers_seq_idx().
>> >>>>
>> >>>>>
>> >>>>> o       CPU 0 completes its summing of the ->c[] array, incorrectly
>> >>>>>         obtaining zero.
>> >>>>>
>> >>>>> o       CPU 0 invokes srcu_readers_seq_idx(), getting the same
>> >>>>>         number back that it got last time.
>> >>>>
>> >>>> If it incorrectly get zero, it means __srcu_read_unlock() is seen
>> >>>> in srcu_readers_active_idx(), and it means the increment of
>> >>>> seq is seen in this srcu_readers_seq_idx(), it is different
>> >>>> from the above seq that it got last time.
>> >>>>
>> >>>> increment of seq is not seen by above srcu_readers_seq_idx(),
>> >>>> but is seen by later one, so the two returned seq is different,
>> >>>> this is the core of Peter's algorithm, and this was written
>> >>>> in the comments(Sorry for my bad English). Or maybe I miss
>> >>>> your means in this mail.
>> >>>
>> >>> OK, good, this analysis agrees with what I was thinking.
>> >>>
>> >>> So my next question is about the lock freedom.  This lock freedom has to
>> >>> be limited in nature and carefully implemented.  The reasons for this are:
>> >>>
>> >>> 1.        Readers can block in any case, which can of course block both
>> >>>   synchronize_srcu_expedited() and synchronize_srcu().
>> >>>
>> >>> 2.        Because only one CPU at a time can be incrementing ->completed,
>> >>>   some sort of lock with preemption disabling will of course be
>> >>>   needed.  Alternatively, an rt_mutex could be used for its
>> >>>   priority-inheritance properties.
>> >>>
>> >>> 3.        Once some CPU has incremented ->completed, all CPUs that might
>> >>>   still be summing up the old indexes must stop.  If they don't,
>> >>>   they might incorrectly call a too-short grace period in case of
>> >>>   ->seq[]-sum overflow on 32-bit systems.
>> >>>
>> >>> Or did you have something else in mind?
>> >>
>> >> When flip happens when check_zero, this check_zero will no be
>> >> committed even it is success.
>> >
>> > But if the CPU in check_zero isn't blocking the grace period, then
>> > ->completed could overflow while that CPU was preempted.  Then how
>> > would this CPU know that the flip had happened?
>>
>> as you said, check the ->completed.
>> but disable the overflow for ->completed.
>>
>> there is a spinlock for srcu_struct(including locking for flipping)
>>
>> 1) assume we need to wait on widx
>> 2) use srcu_read_lock() to hold a reference of the 1-widx active counter
>> 3) release the spinlock
>> 4) do_check_zero
>> 5) gain the spinlock
>> 6) srcu_read_unlock()
>> 7) if ->completed is not changed, and there is no other later check_zero which
>>    is committed earlier than us, we will commit our check_zero if we success.
>>
>> too complicated.
>
> Plus I don't see how it disables overflow for ->completed.

 srcu_read_lock() gets a reader reference, then the other state machine
can do flip at most once, and then the ->completed can't wrap.

Thanks,
Lai

>
> As you said earlier, abandoning the goal of lock freedom sounds like the
> best approach.  Then you can indeed just hold the srcu_struct's mutex
> across the whole thing.
>
>                                                        Thanx, Paul
>
>> Thanks,
>> Lai
>>
>> >
>> >> I play too much with lock-free for call_srcu(), the code becomes complicated,
>> >> I just give up lock-free for call_srcu(), the main aim of call_srcu() is simple.
>> >
>> > Makes sense to me!
>> >
>> >> (But I still like Peter's approach, it has some other good thing
>> >> besides lock-free-checking, if you don't like it, I will send
>> >> another patch to fix srcu_readers_active())
>> >
>> > Try them both and check their performance &c.  If within espilon of
>> > each other, pick whichever one you prefer.
>> >
>> >                                                     Thanx, Paul
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/