linux-kernel - Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFzf1C+WXRtfkdwAbb0kniYJfkfm=rghouM3u8x7-ZJGMg@mail.gmail.com>
Date:   Fri, 8 Sep 2017 18:05:29 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Borislav Petkov <bp@...en8.de>,
        Markus Trippelsdorf <markus@...ppelsdorf.de>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Tom Lendacky <thomas.lendacky@....com>
Subject: Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf

On Fri, Sep 8, 2017 at 5:00 PM, Andy Lutomirski <luto@...nel.org> wrote:
>
> I'm not convinced.  The SDM says (Vol 3, 11.3, under WC):
>
> If the WC buffer is partially filled, the writes may be delayed until
> the next occurrence of a serializing event; such as, an SFENCE or
> MFENCE instruction, CPUID execution, a read or write to uncached
> memory, an interrupt occurrence, or a LOCK instruction execution.
>
> Thanks, Intel, for definiing "serializing event" differently here than
> anywhere else in the whole manual.

Yeah, it's really badly defined. Ok, maybe a locked instruction does
actually wait for it.. It should be invisible to anything, regardless.

> 1. The kernel wants to reclaim a page of normal memory, so it unmaps
> it and flushes.  Another CPU has an entry for that page in its WC
> buffer.  I don't think we care whether the flush causes the WC write
> to really hit RAM because it's unobservable -- we just need to make
> sure it is ordered, as seen by software, before the flush operation
> completes.  From the quote above, I think we're okay here.

Agreed.

> 2. The kernel is unmapping some IO memory (e.g. a GPU command buffer).
> It wants a guarantee that, when flush_tlb_mm_range returns, all CPUs
> are really done writing to it.  Here I'm less convinced.  The SDM
> quote certainly suggests to me that we have a promise that the WC
> write has *started* before flush_tlb_mm_range returns, but I'm not
> sure I believe that it's guaranteed to have retired.

If others have writable TLB entries, what keeps them from just
continuing to write for a long time afterwards?

> I'd prefer to leave it as is except on the buggy AMD CPUs, though,
> since the current code is nice and fast.

So is there a patch to detect the 383 erratum and serialize for those?
I may have missed that part.

              Linus