linux-kernel - Re: [PATCH 1/2] x86/mm: Reinitialize TLB state on hotplug and resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <D768BA17-8D2E-42FD-92D2-D94F6F1A6BF2@amacapital.net>
Date:   Thu, 7 Sep 2017 18:23:27 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Jiri Kosina <jikos@...nel.org>
Cc:     Ingo Molnar <mingo@...nel.org>, Andy Lutomirski <luto@...nel.org>,
        X86 ML <x86@...nel.org>, Borislav Petkov <bpetkov@...e.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 1/2] x86/mm: Reinitialize TLB state on hotplug and resume



> On Sep 7, 2017, at 12:55 PM, Jiri Kosina <jikos@...nel.org> wrote:
> 
> On Thu, 7 Sep 2017, Ingo Molnar wrote:
> 
>>>> When Linux brings a CPU down and back up, it switches to init_mm and then
>>>> loads swapper_pg_dir into CR3.  With PCID enabled, this has the side effect
>>>> of masking off the ASID bits in CR3.
>>>> 
>>>> This can result in some confusion in the TLB handling code.  If we
>>>> bring a CPU down and back up with any ASID other than 0, we end up
>>>> with the wrong ASID active on the CPU after resume.  This could
>>>> cause our internal state to become corrupt, although major
>>>> corruption is unlikely because init_mm doesn't have any user pages.
>>>> More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
>>>> in the next context switch.  The result of *that* is a failure to
>>>> resume from suspend with probability 1 - 1/6^(cpus-1).
>>>> 
>>>> Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
>>>> 
>>>> Reported-by: Linus Torvalds <torvalds@...ux-foundation.org>
>>>> Reported-by: Jiri Kosina <jikos@...nel.org>
>>>> Fixes: 10af6235e0d3 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
>>>> Signed-off-by: Andy Lutomirski <luto@...nel.org>
>>> 
>>> Tested-by: Jiri Kosina <jkosina@...e.cz>
>> 
>> The fix should be upstream already, as of 1c9fe4409ce3 and later.
> 
> Hm, so I've just experienced two instances in a row of reboot just after 
> reading hibernation image (i.e. exactly the same symptom as before) even 
> with 3b9f8ed kernel (which contains the fix). Seems like the fix is either 
> incomplete (just the probability of it happening is lower), or I'm seeing 
> something differet with the same symptom.
> 
> I'll try to figure out whether it's the same VM_BUG_ON() triggering, but 
> probably will be able to do so only tomorrow.
> 

Nah, don't waste your time.  I think I see the bug, and it's a different bug.  It's an easy one-line fix, but I have to figure out how to test it.

> -- 
> Jiri Kosina
> SUSE Labs
>