[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5399eaad-22e6-401f-8850-1d4cef2e122b@acm.org>
Date: Mon, 19 Jun 2017 09:49:08 -0500
From: Corey Minyard <minyard@....org>
To: Tony Camuso <tcamuso@...hat.com>,
openipmi-developer@...ts.sourceforge.net
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ipmi: use rcu lock around call to
intf->handlers->sender()
On 06/19/2017 09:42 AM, Tony Camuso wrote:
> On 06/19/2017 10:32 AM, Corey Minyard wrote:
>> On 06/19/2017 09:29 AM, Tony Camuso wrote:
>>> On 06/19/2017 09:31 AM, Corey Minyard wrote:
>>>> On 06/16/2017 07:15 AM, Corey Minyard wrote:
>>>>> On 06/15/2017 10:54 AM, Corey Minyard wrote:
>>>>>> On 06/13/2017 09:54 AM, Tony Camuso wrote:
>>>>>>> A vendor with a system having more than 128 CPUs occasionally
>>>>>>> encounters a
>>>>>>> crash during shutdown. This is not an easily reproduceable
>>>>>>> event, but the
>>>>>>> vendor was able to provide the following analysis of the crash,
>>>>>>> which
>>>>>>> exhibits the same footprint each time.
>>>>>>>
>>>>>>> crash> bt
>>>>>>> PID: 0 TASK: ffff88017c70ce70 CPU: 5 COMMAND: "swapper/5"
>>>>>>> #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
>>>>>>> #1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
>>>>>>> #2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
>>>>>>> #3 [ffff88085c143c10] oops_end at ffffffff8168ef88
>>>>>>> #4 [ffff88085c143c38] no_context at ffffffff8167ebb3
>>>>>>> #5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
>>>>>>> #6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
>>>>>>> #7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
>>>>>>> #8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
>>>>>>> #9 [ffff88085c143d70] page_fault at ffffffff8168e188
>>>>>>> [exception RIP: unknown or invalid address]
>>>>>>> RIP: ffffffffa053c800 RSP: ffff88085c143e28 RFLAGS: 00010206
>>>>>>> RAX: ffff88017c72bfd8 RBX: ffff88017a8dc000 RCX:
>>>>>>> ffff8810588b5ac8
>>>>>>> RDX: ffff8810588b5a00 RSI: ffffffffa053c800 RDI:
>>>>>>> ffff8810588b5a00
>>>>>>> RBP: ffff88085c143e58 R8: ffff88017c70d408 R9:
>>>>>>> ffff88017a8dc000
>>>>>>> R10: 0000000000000002 R11: ffff88085c143da0 R12:
>>>>>>> ffff8810588b5ac8
>>>>>>> R13: 0000000000000100 R14: ffffffffa053c800 R15:
>>>>>>> ffff8810588b5a00
>>>>>>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>>>>>> --- <IRQ stack> ---
>>>>>>> [exception RIP: cpuidle_enter_state+82]
>>>>>>> RIP: ffffffff81514192 RSP: ffff88017c72be50 RFLAGS: 00000202
>>>>>>> RAX: 0000001e4c3c6f16 RBX: 000000000000f8a0 RCX:
>>>>>>> 0000000000000018
>>>>>>> RDX: 0000000225c17d03 RSI: ffff88017c72bfd8 RDI:
>>>>>>> 0000001e4c3c6f16
>>>>>>> RBP: ffff88017c72be78 R8: 000000000000237e R9:
>>>>>>> 0000000000000018
>>>>>>> R10: 0000000000002494 R11: 0000000000000001 R12:
>>>>>>> ffff88017c72be20
>>>>>>> R13: ffff88085c14f8e0 R14: 0000000000000082 R15:
>>>>>>> 0000001e4c3bb400
>>>>>>> ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
>>>>>>>
>>>>>>> This is the corresponding stack trace
>>>>>>>
>>>>>>> It has crashed because the area pointed with RIP extracted from
>>>>>>> timer
>>>>>>> element is already removed during a shutdown process.
>>>>>>>
>>>>>>> The function is smi_timeout().
>>>>>>>
>>>>>>> And we think ffff8810588b5a00 in RDX is a parameter struct smi_info
>>>>>>>
>>>>>>> crash> rd ffff8810588b5a00 20
>>>>>>> ffff8810588b5a00: ffff8810588b6000 0000000000000000
>>>>>>> .`.X............
>>>>>>> ffff8810588b5a10: ffff880853264400 ffffffffa05417e0
>>>>>>> .D&S......T.....
>>>>>>> ffff8810588b5a20: 24a024a000000000 0000000000000000
>>>>>>> .....$.$........
>>>>>>> ffff8810588b5a30: 0000000000000000 0000000000000000
>>>>>>> ................
>>>>>>> ffff8810588b5a40: ffffffffa053a040 ffffffffa053a060
>>>>>>> @.S.....`.S.....
>>>>>>> ffff8810588b5a50: 0000000000000000 0000000100000001
>>>>>>> ................
>>>>>>> ffff8810588b5a60: 0000000000000000 0000000000000e00
>>>>>>> ................
>>>>>>> ffff8810588b5a70: ffffffffa053a580 ffffffffa053a6e0
>>>>>>> ..S.......S.....
>>>>>>> ffff8810588b5a80: ffffffffa053a4a0 ffffffffa053a250
>>>>>>> ..S.....P.S.....
>>>>>>> ffff8810588b5a90: 0000000500000002 0000000000000000
>>>>>>> ................
>>>>>>>
>>>>>>> Unfortunately the top of this area is already detroyed by someone.
>>>>>>> But because of two reasonns we think this is struct smi_info
>>>>>>> 1) The address included in between ffff8810588b5a70 and
>>>>>>> ffff8810588b5a80:
>>>>>>> are inside of ipmi_si_intf.c see crash> module ffff88085779d2c0
>>>>>>>
>>>>>>> 2) We've found the area which point this.
>>>>>>> It is offset 0x68 of ffff880859df4000
>>>>>>>
>>>>>>> crash> rd ffff880859df4000 100
>>>>>>> ffff880859df4000: 0000000000000000 0000000000000001
>>>>>>> ................
>>>>>>> ffff880859df4010: ffffffffa0535290 dead000000000200
>>>>>>> .RS.............
>>>>>>> ffff880859df4020: ffff880859df4020 ffff880859df4020 @.Y....
>>>>>>> @.Y....
>>>>>>> ffff880859df4030: 0000000000000002 0000000000100010
>>>>>>> ................
>>>>>>> ffff880859df4040: ffff880859df4040 ffff880859df4040
>>>>>>> @@.Y....@@.Y....
>>>>>>> ffff880859df4050: 0000000000000000 0000000000000000
>>>>>>> ................
>>>>>>> ffff880859df4060: 0000000000000000 ffff8810588b5a00
>>>>>>> .........Z.X....
>>>>>>> ffff880859df4070: 0000000000000001 ffff880859df4078
>>>>>>> ........x@......
>>>>>>>
>>>>>>> If we regards it as struct ipmi_smi in shutdown process
>>>>>>> it looks consistent.
>>>>>>>
>>>>>>> The remedy for this apparent race is affixed below.
>>>>>>
>>>>>> I think you are right about this problem, but in_shutdown is
>>>>>> checked already
>>>>>> a bit before when newmsg is extracted from the list. Wouldn't it
>>>>>> be better
>>>>>> to add the rcu_read_lock() region starting right before the previous
>>>>>> in_shutdown check to after the send? That would avoid a leak in
>>>>>> this
>>>>>> case.
>>>>>
>>>>> While lying awake unable to sleep, I realized that you can't call the
>>>>> sender function while holding rcu_read_lock(). That will break RT,
>>>>> because you can't claim a mutex while holding rcu_read_lock(),
>>>>> and the sender function will claim normal spinlocks.
>>>>>
>>>>> So I need to think about this a bit.
>>>>>
>>>>
>>>> I was wrong about this. An rcu_read_lock() around the whole thing
>>>> should
>>>> be all that is required to fix this. I can do a patch, or you can,
>>>> if you like.
>>>>
>>>> Thanks again for pointing this out.
>>>>
>>>> -corey
>>>
>>> Is this what you have in mind?
>>>
>>> ---
>>> drivers/char/ipmi/ipmi_msghandler.c | 11 +++++++++--
>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/char/ipmi/ipmi_msghandler.c
>>> b/drivers/char/ipmi/ipmi_msghandler.c
>>> index 9f69995..e20f8d7 100644
>>> --- a/drivers/char/ipmi/ipmi_msghandler.c
>>> +++ b/drivers/char/ipmi/ipmi_msghandler.c
>>> @@ -3880,6 +3880,9 @@ static void smi_recv_tasklet(unsigned long val)
>>> */
>>> if (!run_to_completion)
>>> spin_lock_irqsave(&intf->xmit_msgs_lock, flags);
>>> +
>>> + rcu_read_lock();
>>> +
>>> if (intf->curr_msg == NULL && !intf->in_shutdown) {
>>> struct list_head *entry = NULL;
>>>
>>> @@ -3894,11 +3897,15 @@ static void smi_recv_tasklet(unsigned long val)
>>> newmsg = list_entry(entry, struct ipmi_smi_msg, link);
>>> intf->curr_msg = newmsg;
>>> }
>>> +
>>> + if (newmsg)
>>> + intf->handlers->sender(intf->send_info, newmsg);
>>> }
>>> +
>>> + rcu_read_unlock();
>>> +
>>> if (!run_to_completion)
>>> spin_unlock_irqrestore(&intf->xmit_msgs_lock, flags);
>>> - if (newmsg)
>>> - intf->handlers->sender(intf->send_info, newmsg);
>>>
>>> handle_new_recv_msgs(intf);
>>> }
>>
>> No, you definitely cannot call the sender function while holding the
>> lock.
>>
>> I was talk about adding rcu_read_lock() before the spin_lock_irqsave()
>> and rcu_read_unlock() after the sender() call.
>
> More like this, then.
>
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c
> b/drivers/char/ipmi/ipmi_msghandler.c
> index 9f69995..49a7e96 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -3878,6 +3878,9 @@ static void smi_recv_tasklet(unsigned long val)
> * because the lower layer is allowed to hold locks while calling
> * message delivery.
> */
> +
> + rcu_read_lock();
> +
> if (!run_to_completion)
> spin_lock_irqsave(&intf->xmit_msgs_lock, flags);
> if (intf->curr_msg == NULL && !intf->in_shutdown) {
> @@ -3900,6 +3903,8 @@ static void smi_recv_tasklet(unsigned long val)
> if (newmsg)
> intf->handlers->sender(intf->send_info, newmsg);
>
> + rcu_read_unlock();
> +
> handle_new_recv_msgs(intf);
> }
>
Perfect, yes.
-corey
>>
>> IIRC, this code was moved from a section that was under the rcu read
>> lock, but the lock was not added when moved.
>>
>> -corey
>>
>
Powered by blists - more mailing lists