linux-kernel - Re: [BUG REPORT] Soft Lockup in smp_call_function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAO6TR8XitHHDtCWU24trDksO7xXY9wtg6H3tkM=gYjFCkrqSnA@mail.gmail.com>
Date:	Sat, 30 Jan 2016 11:05:29 -0700
From:	Jeff Merkey <linux.mdb@...il.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	Mel Gorman <mgorman@...hsingularity.net>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
	Andrew Lutomirski <luto@...nel.org>
Subject: Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8

On 1/30/16, Andy Lutomirski <luto@...capital.net> wrote:
> On Sat, Jan 30, 2016 at 9:53 AM, Jeff Merkey <linux.mdb@...il.com> wrote:
>> On 1/30/16, Andy Lutomirski <luto@...capital.net> wrote:
>>> On Sat, Jan 30, 2016 at 12:41 AM, Jeff Merkey <linux.mdb@...il.com>
>>> wrote:
>>>> Here is an MDB debugger trace of the code in question.  please note
>>>> that the flags being compared don't match what's in r11 and the
>>>> comparison bits are wrong.
>>>>
>>>> (3)>
>>>>
>>>> Break at 0xFFFFFFFF81680022 due to - Proceed (single step)
>>>> RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
>>>> RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
>>>> RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000  R8: FFFF8800CD6C4000
>>>>  R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246  <<
>>>> WRONG!!!
>>>> R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
>>>> R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS:
>>>> 0018
>>>>  IP: FFFFFFFF81680022 FLAGS: 0000000000000146  (PF ZF TF) << real flags
>>>> 0xffffffff81680022 49F7C300010100  test   r11,0x10100   < comparison
>>>> bits correct r11 is WRONG!!!
>>>> (3)>
>>>
>>> I have no idea what bug you're talking about, and I have no idea how
>>> this code could cause a soft lockup in smp_call_function_single (at
>>> worst it could potentially enter userspace with invalid state, this
>>> alternating between user and kernel without making progress in user
>>> mode).
>>>
>>> And the HW flags register has no particular reason to match r11 or, in
>>> fact, anything saved in pt_regs at all.
>>>
>>> --Andy
>>>
>>
>> Hi Andy,
>>
>> There are two cases to handle here with the trap flags with sysret,
>> you are handling just one of them in your fix.  There is the case
>> where you are going to use sysret to load the flags after the
>> instruction executes and that's the case you coded for.  The other
>> case which is not being handled is the one where someone is single
>> stepping through this code and the trap flag gets set and then sysret
>> gets called.
>>
>> From what I can tell, sysret is a broken instruction which will just
>> hang if someone calls it with the trap flag set.   It does not act
>> like this on ia32, just x86_64.    The answer is to not use sysret and
>> use your iret return for all syscalls.
>>
>
> Just so you know, I have no intention of supporting this use case.  In
> fact, I'm planning to eventually stop using IST for #DB entirely, at
> which point the kernel will crash terribly if this code is
> single-stepped (except when using a hypervisor to do this single
> stepping, which is a much more sensible way to handle it).
>
> So MDB may just need to force the slow syscall exit path
> unconditionally, and it'll have to do something else clever to handle
> SYSCALL, because that's going to crash, too.
>
> I will *not* insert a pushfq into the syscall return path.  That would
> slow everything down for the sole benefit of an in-kernel debugger.
>
> --Andy
>

Yep, now you see it.  I'll carry this fix locally in my patch series.

Jeff