[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiq+7sW3Lk5iQ0-zY5XWES4rSxK505vXsgFY=za88+RZw@mail.gmail.com>
Date: Wed, 5 Aug 2020 10:05:04 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Jason A. Donenfeld" <Jason@...c4.com>
Cc: Ingo Molnar <mingo@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Andrew Morton <akpm@...ux-foundation.org>,
Joerg Roedel <jroedel@...e.de>
Subject: Re: [GIT PULL] x86/mm changes for v5.9
On Wed, Aug 5, 2020 at 4:03 AM Jason A. Donenfeld <Jason@...c4.com> wrote:
>
> The commit 8bb9bf242d1f ("x86/mm/64: Do not sync vmalloc/ioremap
> mappings") causes the OOPS below, in Linus' tree and in linux-next,
> unearthed by my CI on <https://www.wireguard.com/build-status/>.
> Bisecting reveals 8bb9bf242d1f, and reverting this makes the OOPS go
> away.
The oops happens early in the function, and the "Code:" line actually
gets almost the whole function prologue in it (missing first two bytes
are probably "push %rbp"):
0: 41 56 push %r14
2: 41 55 push %r13
4: 41 54 push %r12
6: 55 push %rbp
7: 48 89 f5 mov %rsi,%rbp
a: 53 push %rbx
b: 48 89 fb mov %rdi,%rbx
e: 48 83 ec 08 sub $0x8,%rsp
12: 48 8b 06 mov (%rsi),%rax
15: 4c 8b 67 40 mov 0x40(%rdi),%r12
19: 49 89 c6 mov %rax,%r14
1c: 45 30 f6 xor %r14b,%r14b
1f: a8 04 test $0x4,%al
21: b8 00 00 00 00 mov $0x0,%eax
26: 4c 0f 44 f0 cmove %rax,%r14
2a:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction
> BUG: unable to handle page fault for address: ffffe8ffffd00608
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
Yeah, missing page table because it wasn't copied.
Presumably because that kthread is using the active_mm of some random
user space process that didn't get sync'ed.
And the sync_global_pgds() may have ended up being sufficient
synchronization with whoever allocated thigns, even if it wasn't about
the TLB contents themselves.
So apparently the "the page-table pages are all pre-allocated now" is
simply not true. Joerg?
Unless somebody can figure this out fairly quickly, I think it should
just be reverted.
Linus
Powered by blists - more mailing lists