lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM=9tyWv79U9=-YNacf34don_Gn5RGggBsZG7R9FNuV8_qe1g@mail.gmail.com>
Date:   Wed, 19 Jul 2017 06:44:49 +1000
From:   Dave Airlie <airlied@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Jones <pjones@...hat.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Dave Airlie <airlied@...hat.com>,
        Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
        "linux-fbdev@...r.kernel.org" <linux-fbdev@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Lutomirski <luto@...nel.org>,
        Peter Anvin <hpa@...or.com>
Subject: Re: [PATCH] efifb: allow user to disable write combined mapping.

On 19 July 2017 at 05:57, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Tue, Jul 18, 2017 at 7:34 AM, Peter Jones <pjones@...hat.com> wrote:
>>
>> Well, that's kind of amazing, given 3c004b4f7eab239e switched us /to/
>> using ioremap_wc() for the exact same reason.  I'm not against letting
>> the user force one way or the other if it helps, though it sure would be
>> nice to know why.
>
> It's kind of amazing for another reason too: how is ioremap_wc()
> _possibly_ slower than ioremap_nocache() (which is what plain
> ioremap() is)?

In normal operation the console is faster with _wc. It's the side effects
on other cores that is the problem.

> Or maybe it really is something where there is one global write queue
> per die (not per CPU), and having that write queue "active" doing
> combining will slow down every core due to some crazy synchronization
> issue?
>
> x86 people, look at what Dave Airlie did, I'll just repeat it because
> it sounds so crazy:
>
>> A customer noticed major slowdowns while logging to the console
>> with write combining enabled, on other tasks running on the same
>> CPU. (10x or greater slow down on all other cores on the same CPU
>> as is doing the logging).
>>
>> I reproduced this on a machine with dual CPUs.
>> Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz (6 core)
>>
>> I wrote a test that just mmaps the pci bar and writes to it in
>> a loop, while this was running in the background one a single
>> core with (taskset -c 1), building a kernel up to init/version.o
>> (taskset -c 8) went from 13s to 133s or so. I've yet to explain
>> why this occurs or what is going wrong I haven't managed to find
>> a perf command that in any way gives insight into this.
>
> So basically the UC vs WC thing seems to slow down somebody *else* (in
> this case a kernel compile) on another core entirely, by a factor of
> 10x. Maybe the WC writer itself is much faster, but _others_ are
> slowed down enormously.
>
> Whaa? That just seems incredible.

Yes I've been staring at this for a while now trying to narrow it down, I've
been a bit slow on testing it on a wider range of Intel CPUs, I've
only really managed
to play on that particular machine,

I've attached two test files. compile both of them (I just used make
write_resource burn-cycles).

On my test CPU core 1/8 are on same die.

time taskset -c 1 ./burn-cycles
takes about 6 seconds

taskset -c 8 ./write_resource wc
taskset -c 1 ./burn-cycles
takes about 1 minute.

Now I've noticed write_resource wc or not wc doesn't seem to make a
difference, so
I think it matters that efifb has used _wc for the memory area already
and set PAT on it for wc,
and we always get wc on that BAR.

>From the other person seeing it:
"I done a similar test some time ago, the result was the same.
I ran some benchmarks, and it seems that when data set fits in L1
cache there is no significant performance degradation."

Dave.

View attachment "write_resource.c" of type "text/x-csrc" (647 bytes)

View attachment "burn-cycles.c" of type "text/x-csrc" (196 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ