linux-kernel - Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrXhtOZyeNFNUpC1ZXOCSi6_gBNuKzSVXvPApPwqToNmLg@mail.gmail.com>
Date:	Wed, 18 Mar 2015 13:49:11 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Denys Vlasenko <dvlasenk@...hat.com>
Cc:	Stefan Seyfried <stefan.seyfried@...glemail.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Takashi Iwai <tiwai@...e.de>, X86 ML <x86@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

On Wed, Mar 18, 2015 at 1:06 PM, Denys Vlasenko <dvlasenk@...hat.com> wrote:
> On 03/18/2015 08:26 PM, Andy Lutomirski wrote:
>> Hi Linus-
>>
>> You seem to enjoy debugging these things.  Want to give this a shot?
>> My guess is a vmalloc fault accessing either old_rsp or kernel_stack
>> right after swapgs in syscall entry.
>
> The code is:
>
> ENTRY(system_call)
>         SWAPGS_UNSAFE_STACK
> GLOBAL(system_call_after_swapgs)
>         movq    %rsp,PER_CPU_VAR(rsp_scratch)
>         movq    PER_CPU_VAR(kernel_stack),%rsp
>
> If PER_CPU_VAR(var) memory access can page fault
> (I was thinking this is ensured to never fault),
> then on these two instructions such page fault
> will be fatal: we will still have userspace %rsp.
>
> I thought we can only get a NMI or debug interrupt here,
> and they are both set up to use IST stacks
> to prevent this scenario (among other reasons).

I don't think that #DB is possible -- we should never have a
watchpoint on percpu memory like that (unless we're using kgdb, in
which case I think that kgdb should be fixed).

On the other hand, we can and do take page faults on percpu memory,
because percpu lives in vmap space and we lazily populate PGD entries
in per-mm PGDs.  (That is, when we allocate a kernel PGD entry, we
populate it in init_mm's pgd, but we don't proactively copy it during
context switches.)

But the affected system is a laptop, so there shouldn't be CPU hotplug
or enough memory for this to happen.  Confused.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/