[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUgb8frLsmaqAEopsf1O-2io7wGvTO1BLFJq8wjtb+G5Q@mail.gmail.com>
Date: Wed, 6 Sep 2017 15:26:19 -0700
From: Andy Lutomirski <luto@...nel.org>
To: Jiri Kosina <jikos@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Borislav Petkov <bp@...en8.de>
Subject: Re: [GIT PULL] x86/mm changes for v4.14: PCID support, 5-level paging
support, Secure Memory Encryption support
On Wed, Sep 6, 2017 at 2:16 PM, Jiri Kosina <jikos@...nel.org> wrote:
> On Wed, 6 Sep 2017, Jiri Kosina wrote:
>
>> This is a "me too", observed on my Lenovo thinkpad x270 (so it's not
>> specific to that XPS 13 system at all).
>>
>> The symptom I observe is that an attempt to resume from hibernation
>> proceeds up to reading 100% of the hibernation image, and then reboot
>> happens (IOW looks like triple fault).
>>
>> nopcid cures it, I haven't tried to revert 10af6235e0d3 yet, but looks
>> like it's the same thing.
>
> [ reposting the information again with LKML re-introduced to CC ]
>
> As suggested by Andy off-list, I tested with this change to always force
> ASID 0
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 5ca71d1..c3b0811 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -35,7 +35,7 @@ static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen,
> {
> u16 asid;
>
> - if (!static_cpu_has(X86_FEATURE_PCID)) {
> + if (true || !static_cpu_has(X86_FEATURE_PCID)) {
> *new_asid = 0;
> *need_flush = true;
> return;
>
> and that fixes the issue on my system.
I got Linus' config to boot. The problem was that I ended up with a
root-owned file (not sure which) in my tree that cause an incorrect
build but didn't generate errors. I don't know how this happened, but
an ill-timed sudo make -j4 modules_install install was probably
involved. git clean -ffxxxd , did *not* fix it or even notice it in
any obvious way.
Anyway, the problem appears to depend on kernel config because it's
dying here on resume on secondary cpus:
VM_BUG_ON(__read_cr3() != (__sme_pa(real_prev->pgd) | prev_asid));
in switch_mm_irqs_off().
What seems to be going on is that the wakeup CPU is exactly restoring
original state. All other CPUs are restoring swapper_pg_dir but are
failing to restore the PCID tag bits, which trips the assertion w.p.
5/6 per non-boot CPU. So, if you have that debug option set, you die
w.p. 1 - (1/6)^(cpus - 1), which is pretty large.
I'll come up with a clean fix this evening, I hope.
Powered by blists - more mailing lists