linux-kernel - Re: [BUG REPORT] Soft Lockup in smp_call_function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO6TR8V1ZmvjGYVMfra7=32kFXMHruG-Ym2Q8__eZvtH0J6diQ@mail.gmail.com>
Date:	Sat, 30 Jan 2016 01:23:51 -0700
From:	Jeff Merkey <linux.mdb@...il.com>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8

On 1/25/16, Jeff Merkey <linux.mdb@...il.com> wrote:
> On 1/25/16, Jeff Merkey <linux.mdb@...il.com> wrote:
>> On 1/24/16, Jeff Merkey <linux.mdb@...il.com> wrote:
>>> On 1/24/16, Jeff Merkey <linux.mdb@...il.com> wrote:
>>>> If I single step with either kgdb, kgdb, or mdb kernel debuggers over
>>>> a sysret instruction anywhere in the OS,  the system hard hangs in
>>>> smp_call_function_single after the debugger releases the system and it
>>>> resumes normal operation.    The specific place the kernel hangs is in
>>>> the loop below.   The softlockup detector will periodically detect
>>>> this condition when it occurs, but not always, most of the time the
>>>> system is just hung and unresponsive.
>>>>
>>>> (2)> u smp_call_function_single+d8
>>>> <<<< hard hang in this loop with EDX=3
>>>> 0xffffffff810fce48 8B55E0          mov    edx,DWORD PTR
>>>> [rbp-32]=0xCE037DC0
>>>> 0xffffffff810fce4b 83E201          and    edx,0x1
>>>> 0xffffffff810fce4e 75F6            jne
>>>> smp_call_function_single+0xd6 (0xffffffff810fce46) (up)
>>>> <<<<<
>>>> 0xffffffff810fce50 EBC3            jmp
>>>> smp_call_function_single+0xa5 (0xffffffff810fce15) (up)
>>>> 0xffffffff810fce52 8B05E08EC700    mov    eax,[oops_in_progress]=0x0
>>>> 0xffffffff810fce58 85C0            test   eax,eax
>>>> 0xffffffff810fce5a 7585            jne
>>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>>>> 0xffffffff810fce5c 803D8E0C9D0000  cmp    [__warned.20610]=0x00,0x0
>>>> 0xffffffff810fce63 0F8578FFFFFF    jne
>>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>>>> 0xffffffff810fce69 BE24010000      mov    esi,0x124
>>>> 0xffffffff810fce6e 48C7C796B08C81  mov    rdi,0xffffffff818cb096
>>>> 0xffffffff810fce75 894DBC          mov    DWORD PTR [rbp-68]=0x0,ecx
>>>> 0xffffffff810fce78 488955C0        mov    QWORD PTR
>>>> [rbp-64]=0xFFFFFFFFFFFFFF10,rdx
>>>> 0xffffffff810fce7c E8FF21F8FF      call   warn_slowpath_null
>>>> 0xffffffff810fce81 C605690C9D0001  mov    [__warned.20610]=0x00,0x1
>>>> 0xffffffff810fce88 8B4DBC          mov    ecx,DWORD PTR [rbp-68]=0x0
>>>> 0xffffffff810fce8b 488B55C0        mov    rdx,QWORD PTR
>>>> [rbp-64]=0xFFFFFFFFFFFFFF10
>>>> 0xffffffff810fce8f E94DFFFFFF      jmp
>>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>>>> 0xffffffff810fce94 E8A71EF8FF      call   __stack_chk_fail
>>>> 0xffffffff810fce99 0F1F8000000000  nop    DWORD PTR [rax]=0x0
>>>> (2)> g
>>>>
>>>>
>>>> The stack backtrace when the bug occurs is:
>>>>
>>>> smp_call_function_single+0xd8
>>>> unmap_page_range+0x613
>>>> flush_tlb_func+0x0
>>>> smp_call_function_many+215
>>>> native_flush_tlb_others+0x118
>>>> flush_tlb_mm_range+0x61
>>>> tlb_flush_mmu_tlbonly+0x6b
>>>> tlb_finish_mmu+0x14
>>>> unmap_region+0xe2
>>>> vma_rb_erase+0x10f
>>>> do_unmap+0x217
>>>> vm_unmap+0x41
>>>> SyS_munmap+0x22
>>>> entry_SYSCALL_64_fastpath+0x12
>>>>
>>>> I traced through this code a bunch of times in just normal operations
>>>> without triggering the bug to get a feel for what it normally sees in
>>>> EDX and it looks like someone has coded a looping function that always
>>>> has EDX=0 in every case I saw in the except for when this bug occurs.
>>>>
>>>> So the exact C code this maps fro objdump of kernel/smp.o is:
>>>>
>>>>  469:	e8 62 fe ff ff       	callq  2d0 <generic_exec_single>
>>>>  46e:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>>>  * previous function call. For multi-cpu calls its even more
>>>> interesting
>>>>  * as we'll have to ensure no other cpu is observing our csd.
>>>>  */
>>>> static void csd_lock_wait(struct call_single_data *csd)
>>>> {
>>>> 	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
>>>>  471:	83 e2 01             	and    $0x1,%edx
>>>>  474:	74 cf                	je     445 <smp_call_function_single+0xa5>
>>>>  476:	f3 90                	pause
>>>> <<<<<<<<<<
>>>>  478:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>>>  47b:	83 e2 01             	and    $0x1,%edx
>>>>  47e:	75 f6                	jne    476 <smp_call_function_single+0xd6>
>>>> <<<<<<<<<<<
>>>>  480:	eb c3                	jmp    445 <smp_call_function_single+0xa5>
>>>> 	 * Can deadlock when called with interrupts disabled.
>>>> 	 * We allow cpu's that are not yet online though, as no one else can
>>>> 	 * send smp call function interrupt to this cpu and as such deadlocks
>>>> 	 * can't happen.
>>>> 	 */
>>>> 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>>>
>>>> Each time this bug occurs csd->flags is always set to a value of 3 and
>>>> never changes.  When the system is just running normally, it seems to
>>>> be 0 the rest of the time.   Setting EDX=0 from the debugger console
>>>> clears the hang condition and the system seems to recover except the
>>>> system reports this error from the console when you attempt to load
>>>> programs, indicating the ability of the system to load shared objects
>>>> is fritzed.
>>>>
>>>> #
>>>> # ls -l
>>>> /lib64/libc.so.6 version GLI not found   << this error and no shared
>>>> objects will load
>>>> #
>>>> #
>>>>
>>>> Jeff
>>>>
>>>
>>> I am running down a trace of the MSR values for swapgs.  Looks like it
>>> got nested somewhere down in the entry_64 code.   If so, then this is
>>> just a symptom and not the sickness.
>>>
>>> Jeff
>>>
>>
>> I got to the bottom of this one and its related to the function loop
>> locking up the system that is described above when code is allowed to
>> run on one processor but not the others and one of them is trying to
>> send an IPI.
>>
>> You might want to consider some sort of timeout logic for that
>> function or ability for it to recover.    The problem is caused by
>> holding all the processors except the target processor being traced
>> and letting it run for a trace session while the other processors are
>> held suspended.   If I release the all processors each time I step
>> with the target processor the problem goes away.  If I hold the
>> processors while stepping over a section of code this lockup will
>> occur and the processor the bug occurs on apparently never recovers
>>
>> In the process of tracking down this bug I traced all of the swapgs
>> usage in all of the exception handlers and it works well.  The check
>> in paranoid_entry and paranoid_exit seems to catch the cases where the
>> debugger is or has stepped over a swapgs instruction pointing to a gs
>> in userspace for both int1 and int3 exceptions, and recovers correctly
>> through paranoid_entry and paranoid exit with the correct gs and MSR
>> value in all the cases I traced.
>>
>> I coded around this issue but it's probably a bug in using that while
>> loop without some way to determine a processor has gone away
>> temporarily.  The solution is to let the processors run in between
>> each and every int1 trap and not hold a processor in focus mode, then
>> this code snippet that locks up in smp.c does not get hit.
>>
>> Jeff
>>
>
> I guess Linus has already been trying to fix this problem since he was
> the last person to touch that code.  Looks like there is another case
> where csd_unlock does not get called or is delayed.
>
> https://github.com/torvalds/linux/commit/8053871d0f7f67c7efb7f226ef031f78877d6625
>
> I think I'll work on this one some more and see if I can fix it.  It
> still shows up after stepping through a sysret even with my changes,
> its just harder to trigger.   It crashes at exactly the same place in
> the stacktrace from the same source so there's a clue to start with.
>
> Jeff
>

This bug is confirmed to be an int1 recursion that happens because the
sysret code in entry_64.S allows return to userspace with the trap
flag set.  The last person to try to fix this was from this commit --
Ingo.

https://github.com/torvalds/linux/commit/4d7321381e5c7102a3d3faf0a0a0035a09619612

for this reported problem.

http://lkml.iu.edu/hypermail/linux/kernel/1504.0/00594.html

I verified that I am seeing the same problem here, it also exposes a
bug in Linus code in smp.c.  I have that fixed as well -- use a
spinlock and don't try to guess what order hardware will post bus
transactions for memory reads and writes.  That code in smp.c is
busted on some systems.

I have a fix for this problem and Ingo needs to review this code
because I have set the trap flag and watched is cycle through this
code in entry_64.S while set and still end up executing a sysret when
its not supposed to and take the other return path.

Jeff