linux-kernel - Re: [GIT PULL rcu/next] rcu commits for 2.6.40

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DCB157F.20202@kernel.org>
Date:	Wed, 11 May 2011 16:02:23 -0700
From:	Yinghai Lu <yinghai@...nel.org>
To:	paulmck@...ux.vnet.ibm.com
CC:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	Len Brown <lenb@...nel.org>
Subject: Re: [GIT PULL rcu/next] rcu commits for 2.6.40

On 05/11/2011 02:30 PM, Yinghai Lu wrote:
> On 05/11/2011 01:59 PM, Yinghai Lu wrote:
>> On 05/11/2011 01:18 PM, Paul E. McKenney wrote:
>>> On Wed, May 11, 2011 at 09:56:35AM -0700, Yinghai Lu wrote:
>>>> On Tue, May 10, 2011 at 9:54 PM, Paul E. McKenney
>>>> <paulmck@...ux.vnet.ibm.com> wrote:
>>>>> On Tue, May 10, 2011 at 01:52:52PM -0700, Yinghai Lu wrote:
>>>>>> On 05/10/2011 12:32 PM, Paul E. McKenney wrote:
>>>>>>> On Tue, May 10, 2011 at 11:04:57AM -0700, Yinghai Lu wrote:
>>>>>>>> On 05/10/2011 01:56 AM, Paul E. McKenney wrote:
>>>>>>>>> On Mon, May 09, 2011 at 02:09:21PM -0700, Yinghai Lu wrote:
>>>>>>>>>> On Mon, May 9, 2011 at 12:36 AM, Ingo Molnar <mingo@...e.hu> wrote:
>>>>>>>>>>>
>>>>>>>>>>> * Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello, Ingo,
>>>>>>>>>>>>
>>>>>>>>>>>> This pull request covers RCU chnages for 2.6.40.  The major new features
>>>>>>>>>>>> are RCU priority boosting and the addition of kfree_rcu(), the latter
>>>>>>>>>>>> courtesy of Lai Jiangshan.  These two features cover well over half
>>>>>>>>>>>> of the commits.  There are a number of smaller features and bug fixes.
>>>>>>>>>>>> All have been sent to LKML in the following batches:
>>>>>>>>>>>>
>>>>>>>>>>>> 0.    https://lkml.org/lkml/2011/2/22/660: RCU priority boosting preview
>>>>>>>>>>>> 1.    https://lkml.org/lkml/2011/5/1/19: RCU priority boosting, kfree_rcu()
>>>>>>>>>>>> 2.    https://lkml.org/lkml/2011/5/2/40: More uses of kfree_rcu()
>>>>>>>>>>>> 3.    https://lkml.org/lkml/2011/5/8/60: miscellaneous
>>>>>>>>>>>>
>>>>>>>>>>>> The kfree_rcu() uses in the pull request have Acked-by:s from the
>>>>>>>>>>>> maintainers.  I have some additional kfree_rcu() requests that lack
>>>>>>>>>>>> Acked-by:s, and I will deal with these later.
>>>>>>>>>>>>
>>>>>>>>>>>> These channges are available in the -rcu git repository at:
>>>>>>>>>>>>
>>>>>>>>>>>>   git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/next
>>>>>>>>>>>
>>>>>>>>>>> Pulled, thanks a lot Paul!
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> it seems with this one in tip, my 8 sockets test setup will report cpu stall.
>>>>>>>>>>
>>>>>>>>>> after hard code to enable rcu_cpu_stall_suppress
>>>>>>>>>>
>>>>>>>>>> Index: linux-2.6/kernel/rcutree.c
>>>>>>>>>> ===================================================================
>>>>>>>>>> --- linux-2.6.orig/kernel/rcutree.c
>>>>>>>>>> +++ linux-2.6/kernel/rcutree.c
>>>>>>>>>> @@ -174,7 +174,7 @@ module_param(blimit, int, 0);
>>>>>>>>>>  module_param(qhimark, int, 0);
>>>>>>>>>>  module_param(qlowmark, int, 0);
>>>>>>>>>>
>>>>>>>>>> -int rcu_cpu_stall_suppress __read_mostly;
>>>>>>>>>> +int rcu_cpu_stall_suppress __read_mostly = 1;
>>>>>>>>>>  module_param(rcu_cpu_stall_suppress, int, 0644);
>>>>>>>>>>
>>>>>>>>>>  static void force_quiescent_state(struct rcu_state *rsp, int relaxed);
>>>>>>>>>>
>>>>>>>>>> will get system hang after pnp ACPI init.
>>>>>>>>>
>>>>>>>>> Could you please send the stack traces from the RCU CPU stall?  Also,
>>>>>>>>> you do have ce31332d3c77532d6ea97ddcb475a2b02dd358b4 applied, correct?
>>>>>>>>>
>>>>>>>>>                                                   Thanx, Paul
>>>>>>>>
>>>>>>>> Do not have time to bisect it at this point.
>>>>>>>
>>>>>>> Could you please send the stack traces from the RCU CPU stall?
>>>>>
>>>>> Thank you!  OK, so CPU 0 has not been responding, despite resched IPIs.
>>>>> Everyone is idle, except for CPU 124, which detected the stall, and
>>>>> possibly CPU 0, which has csum_partial_copy_generic() on the stack, though
>>>>> that looks like a backtrace error to me.  The fact that it hangs if you
>>>>> disable RCU CPU stall detection leads me to believe that something real
>>>>> is being detected.
>>>>
>>>> the problem is that now I can not disable RCU CPU stall detection any more.
>>>
>>> There is a rcu_cpu_stall_suppress module parameter, and you should be
>>> able to pass in rcu_cpu_stall_suppress=1 as a boot parameter.  However,
>>> I did produce a patch that reverts the change, please see below.
>>> I would be surprised if this did anything different than your change
>>> that initializes rcu_cpu_stall_suppress to 1.  If this patch somehow
>>> does make a difference, please let me know.
>>
>> the same.
>>
>> looks like other commit in the pull cause the delay...of acpi related stuff.
>>
>> [   22.255535] calling  pnpacpi_init+0x0/0x8c @ 1
>> [   22.260001] pnp: PnP ACPI init
>> [   22.263125] ACPI: bus type pnp registered
>> [  603.121084] pnp 00:00: [bus 00-7f]
>> ...
>> [  603.130564] pnp 00:0c: Plug and Play ACPI device, IDs PNP0103 (active)
>> [ 1202.948788] pnp 00:0d: [bus 80-ff]
>> [ 1202.952187] pnp 00:0d: [io  0x0000 window]
>> [ 1202.956316] pnp 00:0d: [io  0xc000-0xffff window]
>> [ 1202.961056] pnp 00:0d: [mem 0x00000000 window]
>> [ 1202.965533] pnp 00:0d: [mem 0xe0000000-0xfbffffff window]
>> [ 1202.970965] pnp 00:0d: [mem 0x00000000 window]
>> [ 1202.975934] pnp 00:0d: Plug and Play ACPI device, IDs PNP0a08 PNP0a03 (active)
>> [ 1202.983823] system 00:0e: Plug and Play ACPI device, IDs PNP0c01 (active)
>> [ 1202.991093] pnp: PnP ACPI: found 15 devices
>> [ 1202.995310] ACPI: ACPI bus type pnp unregistered
>> [ 1202.999963] initcall pnpacpi_init+0x0/0x8c returned 0 after 1153066591 usecs
>> ...
>> [ 1206.094838] calling  shpcd_init+0x0/0xe8 @ 1
>> [ 1206.095194] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
>> [ 1206.095199] initcall shpcd_init+0x0/0xe8 returned 0 after 347 usecs
>> [ 1206.095202] calling  acpiphp_init+0x0/0x5f @ 1
>> [ 1206.095206] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>> [ 1302.573367] initcall acpiphp_init+0x0/0x5f returned -19 after 94216997 usecs
>> [ 1302.580449] calling  ibm_acpiphp_init+0x0/0x175 @ 1
>> [ 1398.573429] acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed
>> [ 1398.579896] initcall ibm_acpiphp_init+0x0/0x175 returned -19 after 93744718 usecs
>> ...
>> [ 1398.739321] calling  acpi_pci_slot_init+0x0/0x20 @ 1
>> [ 1494.572598] initcall acpi_pci_slot_init+0x0/0x20 returned 0 after 93582352 usecs
>> [ 1494.580019] calling  acpi_processor_init+0x0/0xcd @ 1
>> [ 1494.585119] ACPI: acpi_idle registered with cpuidle
>> [ 1494.622886] initcall acpi_processor_init+0x0/0xcd returned 0 after 36896 usecs
>> [ 1494.630135] calling  acpi_container_init+0x0/0x4a @ 1
>> [ 1734.573253] initcall acpi_container_init+0x0/0x4a returned 0 after 234314615 usecs
>> [ 1734.580854] calling  acpi_thermal_init+0x0/0x42 @ 1
>> [ 1734.586032] initcall acpi_thermal_init+0x0/0x42 returned 0 after 275 usecs
>> [ 1734.592926] calling  acpi_memory_device_init+0x0/0x87 @ 1
>> [ 1974.573125] initcall acpi_memory_device_init+0x0/0x87 returned 0 after 234350484 usecs
>> [ 1974.581069] calling  acpi_battery_init+0x0/0x16 @ 1
>> [ 1974.586009] initcall acpi_battery_init+0x0/0x16 returned 0 after 34 usecs
>> [ 1974.586112] calling  1_acpi_battery_init_async+0x0/0x20 @ 5
>> [ 1974.586636] initcall 1_acpi_battery_init_async+0x0/0x20 returned 0 after 505 usecs
>> [ 1974.605962] calling  acpi_power_meter_init+0x0/0x32 @ 1
>> [ 1974.611505] initcall acpi_power_meter_init+0x0/0x32 returned 0 after 286 usecs
>> [ 1974.618748] calling  acpi_hed_init+0x0/0x26 @ 1
>> [ 1974.623565] initcall acpi_hed_init+0x0/0x26 returned 0 after 255 usecs
>> [ 1974.630115] calling  acpi_pad_init+0x0/0x26 @ 1
>> [ 1974.634932] initcall acpi_pad_init+0x0/0x26 returned 0 after 254 usecs
>> [ 1974.641475] calling  erst_init+0x0/0x2bb @ 1
>> [ 1974.645956] ERST: Can not request iomem region <0x              3f-0x              3f> for ERST.
>> [ 2070.572123] initcall erst_init+0x0/0x2bb returned -5 after 93678128 usecs
>> ...
>> [ 2070.792625] calling  serial8250_init+0x0/0x18e @ 1
>> [ 2070.797433] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
>> [ 2070.824642] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>> [ 2534.572086] initcall serial8250_init+0x0/0x18e returned 0 after 452905158 usecs
>> [ 2534.579451] calling  serial8250_pnp_init+0x0/0x12 @ 1
>> [ 2662.572104] initcall serial8250_pnp_init+0x0/0x12 returned 0 after 124987929 usecs
>> [ 2662.579749] calling  serial8250_pci_init+0x0/0x1b @ 1
>> [ 2662.585088] initcall serial8250_pci_init+0x0/0x1b returned 0 after 261 usecs
>>
> 
> without this pull:
> 
> [   18.733956] calling  pnpacpi_init+0x0/0x8c @ 1
> [   18.738400] pnp: PnP ACPI init
> [   18.741484] ACPI: bus type pnp registered
> [   18.798240] pnp 00:00: [bus 00-7f]
> [   18.801638] pnp 00:00: [io  0x0cf8-0x0cff]
> ...
> [   19.302232] pnp 00:0c: Plug and Play ACPI device, IDs PNP0103 (active)
> [   19.381017] pnp 00:0d: [bus 80-ff]
> [   19.384410] pnp 00:0d: [io  0x0000 window]
> [   19.388502] pnp 00:0d: [io  0xc000-0xffff window]
> [   19.393205] pnp 00:0d: [mem 0x00000000 window]
> [   19.397639] pnp 00:0d: [mem 0xe0000000-0xfbffffff window]
> [   19.403031] pnp 00:0d: [mem 0x00000000 window]
> [   19.407874] pnp 00:0d: Plug and Play ACPI device, IDs PNP0a08 PNP0a03 (active)
> [   19.415591] system 00:0e: Plug and Play ACPI device, IDs PNP0c01 (active)
> [   19.422747] pnp: PnP ACPI: found 15 devices
> [   19.426924] ACPI: ACPI bus type pnp unregistered
> [   19.431538] initcall pnpacpi_init+0x0/0x8c returned 0 after 678214 usecs
> 


e59fb3120becfb36b22ddb8bd27d065d3cdca499 is the first bad commit
commit e59fb3120becfb36b22ddb8bd27d065d3cdca499
Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Date:   Tue Sep 7 10:38:22 2010 -0700

    rcu: Decrease memory-barrier usage based on semi-formal proof
    
    Commit d09b62d fixed grace-period synchronization, but left some smp_mb()
    invocations in rcu_process_callbacks() that are no longer needed, but
    sheer paranoia prevented them from being removed.  This commit removes
    them and provides a proof of correctness in their absence.  It also adds
    a memory barrier to rcu_report_qs_rsp() immediately before the update to
    rsp->completed in order to handle the theoretical possibility that the
    compiler or CPU might move massive quantities of code into a lock-based
    critical section.  This also proves that the sheer paranoia was not
    entirely unjustified, at least from a theoretical point of view.
    
    In addition, the old dyntick-idle synchronization depended on the fact
    that grace periods were many milliseconds in duration, so that it could
    be assumed that no dyntick-idle CPU could reorder a memory reference
    across an entire grace period.  Unfortunately for this design, the
    addition of expedited grace periods breaks this assumption, which has
    the unfortunate side-effect of requiring atomic operations in the
    functions that track dyntick-idle state for RCU.  (There is some hope
    that the algorithms used in user-level RCU might be applied here, but
    some work is required to handle the NMIs that user-space applications
    can happily ignore.  For the short term, better safe than sorry.)
    
    This proof assumes that neither compiler nor CPU will allow a lock
    acquisition and release to be reordered, as doing so can result in
    deadlock.  The proof is as follows:
    
    1.	A given CPU declares a quiescent state under the protection of
    	its leaf rcu_node's lock.
    
    2.	If there is more than one level of rcu_node hierarchy, the
    	last CPU to declare a quiescent state will also acquire the
    	->lock of the next rcu_node up in the hierarchy,  but only
    	after releasing the lower level's lock.  The acquisition of this
    	lock clearly cannot occur prior to the acquisition of the leaf
    	node's lock.
    
    3.	Step 2 repeats until we reach the root rcu_node structure.
    	Please note again that only one lock is held at a time through
    	this process.  The acquisition of the root rcu_node's ->lock
    	must occur after the release of that of the leaf rcu_node.
    
    4.	At this point, we set the ->completed field in the rcu_state
    	structure in rcu_report_qs_rsp().  However, if the rcu_node
    	hierarchy contains only one rcu_node, then in theory the code
    	preceding the quiescent state could leak into the critical
    	section.  We therefore precede the update of ->completed with a
    	memory barrier.  All CPUs will therefore agree that any updates
    	preceding any report of a quiescent state will have happened
    	before the update of ->completed.
    
    5.	Regardless of whether a new grace period is needed, rcu_start_gp()
    	will propagate the new value of ->completed to all of the leaf
    	rcu_node structures, under the protection of each rcu_node's ->lock.
    	If a new grace period is needed immediately, this propagation
    	will occur in the same critical section that ->completed was
    	set in, but courtesy of the memory barrier in #4 above, is still
    	seen to follow any pre-quiescent-state activity.
    
    6.	When a given CPU invokes __rcu_process_gp_end(), it becomes
    	aware of the end of the old grace period and therefore makes
    	any RCU callbacks that were waiting on that grace period eligible
    	for invocation.
    
    	If this CPU is the same one that detected the end of the grace
    	period, and if there is but a single rcu_node in the hierarchy,
    	we will still be in the single critical section.  In this case,
    	the memory barrier in step #4 guarantees that all callbacks will
    	be seen to execute after each CPU's quiescent state.
    
    	On the other hand, if this is a different CPU, it will acquire
    	the leaf rcu_node's ->lock, and will again be serialized after
    	each CPU's quiescent state for the old grace period.
    
    On the strength of this proof, this commit therefore removes the memory
    barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
    The effect is to reduce the number of memory barriers by one and to
    reduce the frequency of execution from about once per scheduling tick
    per CPU to once per grace period.
    
    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
    Reviewed-by: Josh Triplett <josh@...htriplett.org>

:040000 040000 cb1295bf2b408fc7a060a54985936376b646028d 1e70d95defc274757b58b80dfdcf681a1595a3ec M	Documentation
:040000 040000 c326c2a8b90d257a47359c20f25cd9eac27b421d 79c31032c17e87226a23dc69ae0f8f6bc84fda5c M	kernel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/