linux-kernel - Re: [PATCH] efifb: allow user to disable write combined mapping.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPM=9tx0CytpjmYy2r1fdiG-KcZUXmxVQKWDAfiLcAfE=ZT7dQ@mail.gmail.com>
Date:   Thu, 20 Jul 2017 14:07:57 +1000
From:   Dave Airlie <airlied@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Jones <pjones@...hat.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Dave Airlie <airlied@...hat.com>,
        Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
        "linux-fbdev@...r.kernel.org" <linux-fbdev@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Lutomirski <luto@...nel.org>,
        Peter Anvin <hpa@...or.com>
Subject: Re: [PATCH] efifb: allow user to disable write combined mapping.

On 19 July 2017 at 11:15, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Tue, Jul 18, 2017 at 5:00 PM, Dave Airlie <airlied@...il.com> wrote:
>>
>> More digging:
>> Single CPU system:
>> Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
>> 01:00.1 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200EH
>>
>> Now I can't get efifb to load on this (due to it being remote and I've
>> no idea how to make
>> my install efi onto it), but booting with no framebuffer, and running
>> the tests on the mga,
>> show no slowdown on this.
>
> Is it actually using write-combining memory without a frame buffer,
> though? I don't think it is. So the lack of slowdown might be just
> from that.
>
>> Now I'm starting to wonder if it's something that only happens on
>> multi-socket systems.
>
> Hmm. I guess that's possible, of course.
>
> [ Wild and crazy handwaving... ]
>
> Without write combining, all the uncached writes will be fully
> serialized and there is no buffering in the chip write buffers. There
> will be at most one outstanding PCI transaction in the uncore write
> buffer.
>
> In contrast, _with_ write combining, the write buffers in the uncore
> can fill up.
>
> But why should that matter? Maybe memory ordering. When one of the
> cores (doesn't matter *which* core) wants to get a cacheline for
> exclusive use (ie it did a write to it), it will need to invalidate
> cachelines in other cores. However, the uncore now has all those PCI
> writes buffered, and the write ordering says that they should happen
> before the memory writes. So before it can give the core exclusive
> ownership of the new cacheline, it needs to wait for all those
> buffered writes to be pushed out, so that no other CPU can see the new
> write *before* the device saw the old writes.
>
> But I'm not convinced this is any different in a multi-socket
> situation than it is in a single-socket one. The other cores on the
> same socket should not be able to see the writes out of order
> _either_.
>
> And honestly, I think PCI write posting rules makes the above crazy
> handwaving completely bogus anyway. Writes _can_ be posted, so the
> memory ordering isn't actually that tight.
>
> I dunno. I really think it would be good if somebody inside Intel
> would look at it..

Yes hoping someone can give some insight.

Scrap the multi-socket it's been seen on a single-socket, but not as
drastic, 2x rather than 10x slowdowns.

It's starting to seem like the commonality might be the Matrox G200EH
which is part of the HP remote management iLO hardware, it might be that
the RAM on the other side of the PCIE connection is causing some sort of
wierd stalls or slowdowns. I'm not sure how best to validate that either.

Dave.