[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <560DB067.3060908@linaro.org>
Date: Thu, 01 Oct 2015 15:15:03 -0700
From: "Shi, Yang" <yang.shi@...aro.org>
To: paulmck@...ux.vnet.ibm.com
CC: Steven Rostedt <rostedt@...dmis.org>, catalin.marinas@....com,
will.deacon@....com, dave.long@...aro.org, panand@...hat.com,
linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
linaro-kernel@...ts.linaro.org
Subject: Re: [v2 PATCH] arm64: replace read_lock to rcu lock in call_break_hook
On 10/1/2015 2:27 PM, Paul E. McKenney wrote:
> On Thu, Oct 01, 2015 at 01:53:51PM -0700, Shi, Yang wrote:
>> On 10/1/2015 10:08 AM, Steven Rostedt wrote:
>>> On Thu, 1 Oct 2015 09:37:37 -0700
>>> Yang Shi <yang.shi@...aro.org> wrote:
>>>
>>>> BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
>>>> in_atomic(): 0, irqs_disabled(): 128, pid: 342, name: perf
>>>> 1 lock held by perf/342:
>>>> #0: (break_hook_lock){+.+...}, at: [<ffffffc0000851ac>] call_break_hook+0x34/0xd0
>>>> irq event stamp: 62224
>>>> hardirqs last enabled at (62223): [<ffffffc00010b7bc>] __call_rcu.constprop.59+0x104/0x270
>>>> hardirqs last disabled at (62224): [<ffffffc0000fbe20>] vprintk_emit+0x68/0x640
>>>> softirqs last enabled at (0): [<ffffffc000097928>] copy_process.part.8+0x428/0x17f8
>>>> softirqs last disabled at (0): [< (null)>] (null)
>>>> CPU: 0 PID: 342 Comm: perf Not tainted 4.1.6-rt5 #4
>>>> Hardware name: linux,dummy-virt (DT)
>>>> Call trace:
>>>> [<ffffffc000089968>] dump_backtrace+0x0/0x128
>>>> [<ffffffc000089ab0>] show_stack+0x20/0x30
>>>> [<ffffffc0007030d0>] dump_stack+0x7c/0xa0
>>>> [<ffffffc0000c878c>] ___might_sleep+0x174/0x260
>>>> [<ffffffc000708ac8>] __rt_spin_lock+0x28/0x40
>>>> [<ffffffc000708db0>] rt_read_lock+0x60/0x80
>>>> [<ffffffc0000851a8>] call_break_hook+0x30/0xd0
>>>> [<ffffffc000085a70>] brk_handler+0x30/0x98
>>>> [<ffffffc000082248>] do_debug_exception+0x50/0xb8
>>>> Exception stack(0xffffffc00514fe30 to 0xffffffc00514ff50)
>>>> fe20: 00000000 00000000 c1594680 0000007f
>>>> fe40: ffffffff ffffffff 92063940 0000007f 0550dcd8 ffffffc0 00000000 00000000
>>>> fe60: 0514fe70 ffffffc0 000be1f8 ffffffc0 0514feb0 ffffffc0 0008948c ffffffc0
>>>> fe80: 00000004 00000000 0514fed0 ffffffc0 ffffffff ffffffff 9282a948 0000007f
>>>> fea0: 00000000 00000000 9282b708 0000007f c1592820 0000007f 00083914 ffffffc0
>>>> fec0: 00000000 00000000 00000010 00000000 00000064 00000000 00000001 00000000
>>>> fee0: 005101e0 00000000 c1594680 0000007f c1594740 0000007f ffffffd8 ffffff80
>>>> ff00: 00000000 00000000 00000000 00000000 c1594770 0000007f c1594770 0000007f
>>>> ff20: 00665e10 00000000 7f7f7f7f 7f7f7f7f 01010101 01010101 00000000 00000000
>>>> ff40: 928e4cc0 0000007f 91ff11e8 0000007f
>>>>
>>>> call_break_hook is called in atomic context (hard irq disabled), so replace
>>>> the sleepable lock to rcu lock and replace relevant list operations to rcu
>>>> version.
>>>>
>>>> Signed-off-by: Yang Shi <yang.shi@...aro.org>
>>>> ---
>>>> v1-> v2
>>>> Replace list operations to rcu version.
>>>>
>>>> arch/arm64/kernel/debug-monitors.c | 10 +++++-----
>>>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
>>>> index cebf786..cf0e4fc 100644
>>>> --- a/arch/arm64/kernel/debug-monitors.c
>>>> +++ b/arch/arm64/kernel/debug-monitors.c
>>>> @@ -276,14 +276,14 @@ static DEFINE_RWLOCK(break_hook_lock);
>>>> void register_break_hook(struct break_hook *hook)
>>>> {
>>>> write_lock(&break_hook_lock);
>>>> - list_add(&hook->node, &break_hook);
>>>> + list_add_rcu(&hook->node, &break_hook);
>>>> write_unlock(&break_hook_lock);
>>>> }
>>>>
>>>> void unregister_break_hook(struct break_hook *hook)
>>>> {
>>>> write_lock(&break_hook_lock);
>>>> - list_del(&hook->node);
>>>> + list_del_rcu(&hook->node);
>>>> write_unlock(&break_hook_lock);
>>>> }
>>>
>>> Shouldn't there be a synchronize_rcu() somewhere?
>>
>> So far kgdb is the only user of unregister_break_hook in mainline kernel.
>>
>> Just read Documentation/RCU/checklist.txt, it says:
>>
>> Note that synchronize_rcu() -only- guarantees to wait until
>> all currently executing rcu_read_lock()-protected RCU read-side
>> critical sections complete.
>>
>> For kgdb, the unregister is just called in kgdb_arch_exit by
>> kgdb_unregister_io_module, which is called when rmmod kgdb module.
>>
>> The break point handler is done synchronously. So, it sounds should
>> be not a problem without calling synchronize_rcu().
>
> OK, I will bite... What does "synchronously" mean here? Unless you
> have somehow guaranteed that all current readers in call_break_hook()
> are done between the time you call unregister_break_hook() to remove a
> given break_hook structure and the time you call register_break_hook()
> to add that same structure back in, you have a problem.
For kgdb usecase, this might be guaranteed.
Generally, kgdb module is loaded then register_break_hook() is called.
Then connect kgdb from host or via kdb, set breakpoint, wait for the
break point is hit, run some commands to debug. Then finish debug, rmmod
kgdb which will call unregister_break_hook().
It sounds the current readers in call_break_hook() could be done during
the time otherwise I won't be able to continue my debug when break point
is hit.
>
> What you have now only protects against invoking register_break_hook()
> on newly allocated and initialized break_hook structure. But the only
> calls to register_break_hook() that I see in v4.2 use compile-time
> initialized structures. So the only failure from using non-RCU list
> primitives would be due to the list_head's ->next pointer initialization.
> This could momentarily make the list appear to have only the new element,
> but not the old element.
>
> Unless you do a series of register_break_hook() and unregister_break_hook()
> calls, in which case a previously deleted structure could momentarily
> appear to already (or still) be in the list.
They are called in series:
In kgdb_arch_init
register_break_hook(&kgdb_brkpt_hook);
register_break_hook(&kgdb_compiled_brkpt_hook);
In kgdb_arch_exit
unregister_break_hook(&kgdb_brkpt_hook);
unregister_break_hook(&kgdb_compiled_brkpt_hook);
Yang
>
> Are those the sorts of failures you are seeing?
>
> Thanx, Paul
>
>> Yang
>>
>>> -- Steve
>>>
>>>>
>>>> @@ -292,11 +292,11 @@ static int call_break_hook(struct pt_regs *regs, unsigned int esr)
>>>> struct break_hook *hook;
>>>> int (*fn)(struct pt_regs *regs, unsigned int esr) = NULL;
>>>>
>>>> - read_lock(&break_hook_lock);
>>>> - list_for_each_entry(hook, &break_hook, node)
>>>> + rcu_read_lock();
>>>> + list_for_each_entry_rcu(hook, &break_hook, node)
>>>> if ((esr & hook->esr_mask) == hook->esr_val)
>>>> fn = hook->fn;
>>>> - read_unlock(&break_hook_lock);
>>>> + rcu_read_unlock();
>>>>
>>>> return fn ? fn(regs, esr) : DBG_HOOK_ERROR;
>>>> }
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists