[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUYjchZaNwFdjo=BPbmyyo5vgAW=T6nU6-TYEWq4338pw@mail.gmail.com>
Date: Wed, 18 Mar 2015 16:22:55 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Stefan Seyfried <stefan.seyfried@...glemail.com>
Cc: Jiri Kosina <jkosina@...e.cz>,
Denys Vlasenko <dvlasenk@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Takashi Iwai <tiwai@...e.de>, X86 ML <x86@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
On Wed, Mar 18, 2015 at 3:40 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Wed, Mar 18, 2015 at 3:38 PM, Stefan Seyfried
> <stefan.seyfried@...glemail.com> wrote:
>> Am 18.03.2015 um 23:29 schrieb Andy Lutomirski:
>>> On Wed, Mar 18, 2015 at 3:22 PM, Jiri Kosina <jkosina@...e.cz> wrote:
>>>> On Wed, 18 Mar 2015, Andy Lutomirski wrote:
>>>>
>>>>> sysret64 can only fail with #GP, and we're totally screwed if that
>>>>> happens,
>>>>
>>>> But what if the GPF handler pagefaults afterwards? It'd be operating on
>>>> user stack already.
>>>
>>> Good point.
>>>
>>> Stefan, can you try changing the first "jne
>>> opportunistic_sysret_failed" to "jmp opportunistic_sysret_failed" in
>>> entry_64.S and seeing if you can reproduce this? (Is it easy enough
>>> to reproduce that this would tell us anything?)
>>
>> I have no good way of reproducing the issue (happens once per week...)
>> but apparently Takashi has, so I'd like to hand this task over to him.
>>
>>> It's a shame that double_fault doesn't record what gs was on entry.
>>> If we did sysret -> general_protection -> page_fault -> double_fault,
>>> then we'd enter double_fault with usergs, whereas syscall ->
>>> page_fault -> double_fault would enter double_fault with kernelgs.
>>>
>>> Hmm. We may be able to answer this more directly. Stefan, can you
>>> dump a couple hundred bytes starting at 0x00007fffa55eafb8 (i.e. your
>>> page_fault stack at the time of the failure)? That will tell us the
>>> faulting address. If that fails, try starting at 00007fffa55eb000
>>> instead.
>>
>> Unfortunately not, is this userspace memory? It's not in the dump I have.
>> This issue is the first I have seen where having a full dump would be
>> really helpful apart from cosmetic reasons...
>
> Yes, it's userspace. Thanks for checking, though.
One more stupid hunch:
Can you do:
x/21xg ffff8801013d4f58
If I counted right, that'll dump task_pt_regs(current).
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists