[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8F197BEF-E32D-4309-A70C-F19EE7EEC994@vmware.com>
Date: Wed, 26 Jun 2019 01:34:17 +0000
From: Nadav Amit <namit@...are.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
the arch/x86 maintainers <x86@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Richard Henderson <rth@...ddle.net>,
Ivan Kokshaysky <ink@...assic.park.msu.ru>,
Matt Turner <mattst88@...il.com>,
Tony Luck <tony.luck@...el.com>,
Fenghua Yu <fenghua.yu@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...riel.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH 0/9] x86: Concurrent TLB flushes and other improvements
> On Jun 25, 2019, at 3:02 PM, Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 6/12/19 11:48 PM, Nadav Amit wrote:
>> Running sysbench on dax w/emulated-pmem, write-cache disabled, and
>> various mitigations (PTI, Spectre, MDS) disabled on Haswell:
>>
>> sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
>> --file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run
>>
>> events (avg/stddev)
>> -------------------
>> 5.2-rc3: 1247669.0000/16075.39
>> +patchset: 1290607.0000/13617.56 (+3.4%)
>
> Why did you decide on disabling the side-channel mitigations? While
> they make things slower, they're also going to be with us for a while,
> so they really are part of real-world testing IMNHO. I'd be curious
> whether this set has more or less of an advantage when all the
> mitigations are on.
It seemed reasonable since I wanted to avoid all kind of “noise”. I presume
the relative speedup would be smaller, due to the overhead of the
mitigations, would be smaller. Note that in this benchmark every TLB
invalidation is of a single entry. The benefit (in the terms of absolute
time saved) would have been greater if a flush was of multiple entries.
> Also, why only 4 threads? Does this set help most when using a moderate
> number of threads since the local and remote cost are (relatively) close
> vs. a large system where doing lots of remote flushes is *way* more
> time-consuming than a local flush?
Don’t overthink it. My server was busy doing something else, so I was
running the tests on a lame desktop I have. I will rerun it on a bigger
machine.
I presume the performance benefit will be smaller when more cores are
involved, since the TLB shootdown time will be dominated by the inter-core
communication time (IPI+cache coherency) and the tail latency of the IPI
delivery (if interrupts are disabled on the target).
I am working on some patches to reduce these overheads as well.
Powered by blists - more mailing lists