[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUiY8oU9aD-ABx=Drghh5GO-s92MuimVOJRuOAuEYL1jQ@mail.gmail.com>
Date: Mon, 14 Dec 2015 09:56:03 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Jeff Merkey <linux.mdb@...il.com>
Cc: Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
Steven Rostedt <rostedt@...dmis.org>,
Borislav Petkov <bp@...en8.de>, Jiri Olsa <jolsa@...nel.org>
Subject: Re: [PATCH 1/1] Fix int1 recursion when no perf_bp_event is registeredy
On Mon, Dec 14, 2015 at 9:52 AM, Jeff Merkey <linux.mdb@...il.com> wrote:
> On 12/14/15, Ingo Molnar <mingo@...nel.org> wrote:
>>
>> * Jeff Merkey <linux.mdb@...il.com> wrote:
>>
>>> On 12/14/15, Ingo Molnar <mingo@...nel.org> wrote:
>>> >
>>> > A: Because it messes up the order in which people normally read text.
>>> > Q: Why is top-posting such a bad thing?
>>> > A: Top-posting.
>>> > Q: What is the most annoying thing in e-mail?
>>> >
>>> > * Jeff Merkey <linux.mdb@...il.com> wrote:
>>> >
>>> >> I trigger it by writing to the dr7 and dr1, 2, 3 or four register and
>>> >> set an
>>> >> execute breakpoint without going through arch_install_hw_breakpoint.
>>> >> When
>>> >> the breakpoint fires, the system crashes and hangs on the processor
>>> >> stuck in
>>> >> an endless loop inside the int1 handler in hw_breakpoint.c --
>>> >
>>> > What is still not clear to me, can you trigger the hang not via some
>>> > special
>>> >
>>> > kernel driver that goes outside regular APIs and messes with the state
>>> > of the
>>> > debug registers, but via the proper access methods, i.e. various
>>> > user-space
>>> > ABIs?
>>>
>>> Any process that can get access to the debug registers can trigger this
>>> condition. [...]
>>
>> A process on an unmodified Linux kernel can only modify debug registers via
>> the
>> proper APIs:
>>
>>> [...] As it stands, if restricted to the established API in
>>> hw_breakpoint.c
>>> this bug should not occur unless someone triggers an errant breakpoint.
>>> [...]
>>
>> So am I interpreting your report correctly:
>>
>> "If the Linux kernel is modified to change debug registers without using
>> the
>> proper APIs (such as loading a module that changes hardware registers in
>> a raw
>> fashion), things may break and a difficult to debug hang may occur."
>>
>> right?
>>
>> This key piece of information should have been part of the original report.
>>
>> So I'm wondering, why does your module modify debug registers in a raw
>> fashion?
>> Why doesn't it use the proper APIs?
>>
>> Thanks,
>>
>> Ingo
>>
>
> Hi Ingo,
>
> This will be a lengthy reply to properly explain this to you. First
> some fundamental assumptions to clear up.
>
> 1. The MDB Debugger Module does not cause this problem. This is an
> existing bug in the kernel in an exception code path.
> 8. If any process or module sets a breakpoint outside of linux
> breakpoint API in this code path the system will crash. Its A BUG,
> and it's been in linux 13 years. I am certain people have seen it
> while running perf stuff but since it provides no diagnostic info,
> someone would just reset the system.
Putting "BUG" in caps doesn't make it so. What's wrong with it?
> 9. This breakpoint API needs to be rewritten to be global breakpoint
> aware, have an on/off switch so when a debugger enters an int1
> exception, breakpoints are globally disabled (a requirement), among
> other things.
A "requirement" for what?
>
> The patch simply fixes the bug in the int handler that will cause a
> lockup. The perf event system, kgdb, kdb, and any one of a number of
> programs can trigger this bug, and probably have. People would blame
> the debugger when its a bug in the int handler.
You ignored feedback from me and from tglx, and you still haven't
explained why this is a bug in the first place. Maybe the code could
degrade more gracefully if you use it wrong, but the int1 handler and
the rest of the kernel are very much aware of each other, and the int1
handler's failure to do what something that isn't in the kernel wants
it to do isn't a bug.
If you submit a clean patch to improve robustness of the handler and
if the new code is at least as clean as the old code, that might be a
different story.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists