linux-kernel - Re: 4.4-rc5 Setting hardware breakpoint in int_ret_from_sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO6TR8WuK4tBfgn5Ejk4Mh=0pxCG1Xmr0-Gat1cVF6tpQGV0ZQ@mail.gmail.com>
Date:	Thu, 17 Dec 2015 01:35:29 -0700
From:	Jeff Merkey <linux.mdb@...il.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>
Subject: Re: 4.4-rc5 Setting hardware breakpoint in int_ret_from_sys_call
 causes triple fault/reboot

On 12/16/15, Jeff Merkey <linux.mdb@...il.com> wrote:
> On 12/16/15, Andy Lutomirski <luto@...capital.net> wrote:
>> On Wed, Dec 16, 2015 at 4:31 PM, Jeff Merkey <linux.mdb@...il.com> wrote:
>>> On 12/16/15, Andy Lutomirski <luto@...capital.net> wrote:
>>>> On Dec 16, 2015 3:12 PM, "Jeff Merkey" <linux.mdb@...il.com> wrote:
>>>>>
>>>>> Setting a hardware breakpoint at the
>>>>>
>>>>> rex64 sysret
>>>>>
>>>>> instruction at the end of int_ret_from_sys_call causes the system to
>>>>> triple fault
>>>>> and reboot when the breakpoint is triggered.  Appears to be related
>>>>> the same problem
>>>>> as the lockup.
>>>>>
>>>>> This function can be stepped over and traced through with the TRAP
>>>>> FLAG set so long as a hardware breakpoint is set somewhere in the
>>>>> function.  Otherwise upon exist the system hard hangs.  If you break
>>>>> exactly on that instruction -- reboot.   If you break a few
>>>>> instructions before it and single step through the call it works.  If
>>>>> you step through the call with no breakpoint the system hard hangs.
>>>>> Same behavior as when you try to step from inside an nmi handler.
>>>>> Looks related.
>>>>
>>>> You're probably encountering the user mode RSP when SYSRET happens.
>>>>
>>>> --Andy
>>>>
>>>
>>> Hi Andy,
>>>
>>> Could be, but I am getting a double fault message with an error code
>>> of 0 that then scrolls off the screen when the triple fault hits.  It
>>> flashes too quickly to get the function address -- wish I had a logic
>>> analyzer with an inverse assembler -- would already be there.    A
>>> usermode RSP would I assume clear TRAP flag and that does not explain
>>> why it works if I set a breakpoint right above the instruction then
>>> step over it, which I can without the triple fault.
>>>
>>> Easy to reproduce, download the mdb debugger for 4.3.3 and apply it to
>>> 4.4-rc5, modprobe mdb, echo a > /proc/sysrq_trigger, u
>>> int_ret_from_syscall (scroll til you get to the swapgs then rex64
>>> sysret, set a hardware breakpoint  at that address , i.e. b
>>> ffffffff81673ae1 (or whatever address the swapgs instruction is at),
>>> then step through with t a few times (should just return after rex64
>>> sysret since it returns to user space).  The set a breakpoint at the
>>> rex64 sysret instruction, b <address>, let it break at the
>>> instruction, then hit g for go and watch the fireworks -- it will try
>>> to print a double fault message then reboot.
>>>
>>> I handle the whole user RSP thing, I just return if I see regs set to
>>> user space.  This looks like some sort of problem in the exception
>>> handlers.
>>
>> It's kernel regs but user RSP.
>>
>> --Andy
>>
>
> right, I handle that case and I have handled that case since about
> 2001.  Used to before all the change I could just step from userspace
> to kernel space with mdb.  Have not been able to do that for while
> since Linus fixed the VM in about 2002.
>
> So I handle that case.
>
> Jeff
>

It looks like that an architectural decision is the result of this bug
and I don't think there is anything I can do about it without a very
large, very ugly patch that alters the architecture of linux.  Linux
has loaded an MSR value into the processor and called swapgs, gets a
breakpoint exception, MSR gets changed again and swapped somewhere
else, then hits the next instruction.  The triple fault is a GP, SS,
and UD.

This is a case where linux was not designed for a debugger, and to fix
this is a BIG job.  Will require lots of changes in places we probably
shouldn't be changing including all exception handlers and possible
removal of the swapgs instruction.    This one I will document as a
known limitation of Linux and move on.

There will be no patch unless someone asks me to try to fix this.
Bottom line, linux is debugger hostile and not designed for one.  What
tools there are will have problems on linux for debugging until Linus
decides Linux will become a more debugger friendly place.  I've
written several commercial operating systems in my 35 years of
programming, and the first item I always write before a kernel,
drivers, or anything else is a debugger.  The OS is then built on top
of it.

Linus read a book and decided to write an OS and his system reflects
that -- no thought of debuggers and his development process operates a
lot like a public library.  It's not all bad -- look how far he got.

This bug is closed since I know what it is.  The probability of this
occurring during normal operations is very low unless you debug and
break between a swapgs function and a rex64 sysret or set a breakpoint
anywhere near this instruction.

Linux Documentation

https://www.kernel.org/doc/Documentation/x86/entry_64.txt

"... Dealing with the swapgs instruction is especially tricky.  Swapgs
toggles whether gs is the kernel gs or the user gs.  The swapgs
instruction is rather fragile: it must nest perfectly and only in
single depth, it should only be used if entering from user mode to
kernel mode and then when returning to user-space, and precisely
so. If we mess that up even slightly, we crash.

So when we have a secondary entry, already in kernel mode, we *must
not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
not switched/swapped yet.   ..."

:-)

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/