linux-kernel - Re: [BUG REPORT] Soft Lockup in smp_call_function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO6TR8W04E2dwt_aUE0Tzo6PtCaRfgGXnhxys4U0U_pgO=9oxA@mail.gmail.com>
Date:	Mon, 25 Jan 2016 16:06:29 -0700
From:	Jeff Merkey <linux.mdb@...il.com>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8

On 1/24/16, Jeff Merkey <linux.mdb@...il.com> wrote:
> On 1/24/16, Jeff Merkey <linux.mdb@...il.com> wrote:
>> If I single step with either kgdb, kgdb, or mdb kernel debuggers over
>> a sysret instruction anywhere in the OS,  the system hard hangs in
>> smp_call_function_single after the debugger releases the system and it
>> resumes normal operation.    The specific place the kernel hangs is in
>> the loop below.   The softlockup detector will periodically detect
>> this condition when it occurs, but not always, most of the time the
>> system is just hung and unresponsive.
>>
>> (2)> u smp_call_function_single+d8
>> <<<< hard hang in this loop with EDX=3
>> 0xffffffff810fce48 8B55E0          mov    edx,DWORD PTR
>> [rbp-32]=0xCE037DC0
>> 0xffffffff810fce4b 83E201          and    edx,0x1
>> 0xffffffff810fce4e 75F6            jne
>> smp_call_function_single+0xd6 (0xffffffff810fce46) (up)
>> <<<<<
>> 0xffffffff810fce50 EBC3            jmp
>> smp_call_function_single+0xa5 (0xffffffff810fce15) (up)
>> 0xffffffff810fce52 8B05E08EC700    mov    eax,[oops_in_progress]=0x0
>> 0xffffffff810fce58 85C0            test   eax,eax
>> 0xffffffff810fce5a 7585            jne
>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>> 0xffffffff810fce5c 803D8E0C9D0000  cmp    [__warned.20610]=0x00,0x0
>> 0xffffffff810fce63 0F8578FFFFFF    jne
>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>> 0xffffffff810fce69 BE24010000      mov    esi,0x124
>> 0xffffffff810fce6e 48C7C796B08C81  mov    rdi,0xffffffff818cb096
>> 0xffffffff810fce75 894DBC          mov    DWORD PTR [rbp-68]=0x0,ecx
>> 0xffffffff810fce78 488955C0        mov    QWORD PTR
>> [rbp-64]=0xFFFFFFFFFFFFFF10,rdx
>> 0xffffffff810fce7c E8FF21F8FF      call   warn_slowpath_null
>> 0xffffffff810fce81 C605690C9D0001  mov    [__warned.20610]=0x00,0x1
>> 0xffffffff810fce88 8B4DBC          mov    ecx,DWORD PTR [rbp-68]=0x0
>> 0xffffffff810fce8b 488B55C0        mov    rdx,QWORD PTR
>> [rbp-64]=0xFFFFFFFFFFFFFF10
>> 0xffffffff810fce8f E94DFFFFFF      jmp
>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>> 0xffffffff810fce94 E8A71EF8FF      call   __stack_chk_fail
>> 0xffffffff810fce99 0F1F8000000000  nop    DWORD PTR [rax]=0x0
>> (2)> g
>>
>>
>> The stack backtrace when the bug occurs is:
>>
>> smp_call_function_single+0xd8
>> unmap_page_range+0x613
>> flush_tlb_func+0x0
>> smp_call_function_many+215
>> native_flush_tlb_others+0x118
>> flush_tlb_mm_range+0x61
>> tlb_flush_mmu_tlbonly+0x6b
>> tlb_finish_mmu+0x14
>> unmap_region+0xe2
>> vma_rb_erase+0x10f
>> do_unmap+0x217
>> vm_unmap+0x41
>> SyS_munmap+0x22
>> entry_SYSCALL_64_fastpath+0x12
>>
>> I traced through this code a bunch of times in just normal operations
>> without triggering the bug to get a feel for what it normally sees in
>> EDX and it looks like someone has coded a looping function that always
>> has EDX=0 in every case I saw in the except for when this bug occurs.
>>
>> So the exact C code this maps fro objdump of kernel/smp.o is:
>>
>>  469:	e8 62 fe ff ff       	callq  2d0 <generic_exec_single>
>>  46e:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>  * previous function call. For multi-cpu calls its even more interesting
>>  * as we'll have to ensure no other cpu is observing our csd.
>>  */
>> static void csd_lock_wait(struct call_single_data *csd)
>> {
>> 	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
>>  471:	83 e2 01             	and    $0x1,%edx
>>  474:	74 cf                	je     445 <smp_call_function_single+0xa5>
>>  476:	f3 90                	pause
>> <<<<<<<<<<
>>  478:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>  47b:	83 e2 01             	and    $0x1,%edx
>>  47e:	75 f6                	jne    476 <smp_call_function_single+0xd6>
>> <<<<<<<<<<<
>>  480:	eb c3                	jmp    445 <smp_call_function_single+0xa5>
>> 	 * Can deadlock when called with interrupts disabled.
>> 	 * We allow cpu's that are not yet online though, as no one else can
>> 	 * send smp call function interrupt to this cpu and as such deadlocks
>> 	 * can't happen.
>> 	 */
>> 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>
>> Each time this bug occurs csd->flags is always set to a value of 3 and
>> never changes.  When the system is just running normally, it seems to
>> be 0 the rest of the time.   Setting EDX=0 from the debugger console
>> clears the hang condition and the system seems to recover except the
>> system reports this error from the console when you attempt to load
>> programs, indicating the ability of the system to load shared objects
>> is fritzed.
>>
>> #
>> # ls -l
>> /lib64/libc.so.6 version GLI not found   << this error and no shared
>> objects will load
>> #
>> #
>>
>> Jeff
>>
>
> I am running down a trace of the MSR values for swapgs.  Looks like it
> got nested somewhere down in the entry_64 code.   If so, then this is
> just a symptom and not the sickness.
>
> Jeff
>

I got to the bottom of this one and its related to the function loop
locking up the system that is described above when code is allowed to
run on one processor but not the others and one of them is trying to
send an IPI.

You might want to consider some sort of timeout logic for that
function or ability for it to recover.    The problem is caused by
holding all the processors except the target processor being traced
and letting it run for a trace session while the other processors are
held suspended.   If I release the all processors each time I step
with the target processor the problem goes away.  If I hold the
processors while stepping over a section of code this lockup will
occur and the processor the bug occurs on apparently never recovers

In the process of tracking down this bug I traced all of the swapgs
usage in all of the exception handlers and it works well.  The check
in paranoid_entry and paranoid_exit seems to catch the cases where the
debugger is or has stepped over a swapgs instruction pointing to a gs
in userspace for both int1 and int3 exceptions, and recovers correctly
through paranoid_entry and paranoid exit with the correct gs and MSR
value in all the cases I traced.

I coded around this issue but it's probably a bug in using that while
loop without some way to determine a processor has gone away
temporarily.  The solution is to let the processors run in between
each and every int1 trap and not hold a processor in focus mode, then
this code snippet that locks up in smp.c does not get hit.

Jeff