lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 26 Jun 2019 01:34:17 +0000
From:   Nadav Amit <>
To:     Dave Hansen <>
CC:     Peter Zijlstra <>,
        Andy Lutomirski <>,
        LKML <>,
        Ingo Molnar <>, Borislav Petkov <>,
        the arch/x86 maintainers <>,
        Thomas Gleixner <>,
        Dave Hansen <>,
        Richard Henderson <>,
        Ivan Kokshaysky <>,
        Matt Turner <>,
        Tony Luck <>,
        Fenghua Yu <>,
        Andrew Morton <>,
        Rik van Riel <>,
        Josh Poimboeuf <>,
        Paolo Bonzini <>
Subject: Re: [PATCH 0/9] x86: Concurrent TLB flushes and other improvements

> On Jun 25, 2019, at 3:02 PM, Dave Hansen <> wrote:
> On 6/12/19 11:48 PM, Nadav Amit wrote:
>> Running sysbench on dax w/emulated-pmem, write-cache disabled, and
>> various mitigations (PTI, Spectre, MDS) disabled on Haswell:
>> sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
>>  --file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run
>> 			events (avg/stddev)
>> 			-------------------
>>  5.2-rc3:		1247669.0000/16075.39
>>  +patchset:		1290607.0000/13617.56 (+3.4%)
> Why did you decide on disabling the side-channel mitigations?  While
> they make things slower, they're also going to be with us for a while,
> so they really are part of real-world testing IMNHO.  I'd be curious
> whether this set has more or less of an advantage when all the
> mitigations are on.

It seemed reasonable since I wanted to avoid all kind of “noise”. I presume
the relative speedup would be smaller, due to the overhead of the
mitigations, would be smaller. Note that in this benchmark every TLB
invalidation is of a single entry. The benefit (in the terms of absolute
time saved) would have been greater if a flush was of multiple entries.

> Also, why only 4 threads?  Does this set help most when using a moderate
> number of threads since the local and remote cost are (relatively) close
> vs. a large system where doing lots of remote flushes is *way* more
> time-consuming than a local flush?

Don’t overthink it. My server was busy doing something else, so I was
running the tests on a lame desktop I have. I will rerun it on a bigger

I presume the performance benefit will be smaller when more cores are
involved, since the TLB shootdown time will be dominated by the inter-core
communication time (IPI+cache coherency) and the tail latency of the IPI
delivery (if interrupts are disabled on the target).

I am working on some patches to reduce these overheads as well.

Powered by blists - more mailing lists