lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190429145159.GA29076@hc>
Date:   Mon, 29 Apr 2019 14:52:11 +0000
From:   Jan Glauber <jglauber@...vell.com>
To:     "catalin.marinas@....com" <catalin.marinas@....com>,
        "will.deacon@....com" <will.deacon@....com>
CC:     "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Jayachandran Chandrasekharan Nair <jnair@...vell.com>
Subject: [RFC] Disable lockref on arm64

Hi Catalin & Will,

I've been looking into performance issues that were reported for several
test-cases, for instance an nginx benchmark.

It turned out the issue we have on ThunderX2 is the file open-close sequence
with small read sizes. If the used files are opened read-only the
lockref code (enabled by ARCH_USE_CMPXCHG_LOCKREF) is used.

The lockref CMPXCHG_LOOP uses an unbound (as long as the associated
spinlock isn't taken) while loop to change the lock count. This behaves
badly under heavy contention (~25x retries for one cmpxchg to succeed
with 28 threads operating on the same file). In case of a NUMA system
it also behaves badly as the access from the other socket is much slower.

The fact that on ThunderX2 cpu_relax() turns only into one NOP
instruction doesn't help either. On Intel pause seems to block the thread
much longer, avoiding the heavy contention thereby.

With the queued spinlocks implementation I can see a major improvement
when I disable lockref. A trivial open-close test-case improves by
factor 2 while system time is decreasing also 2x. Looking at kernel compile
and dbench numbers didn't show any regression with lockref disabled.

Can we simply disable lockref? Is anyone else seeing this issue? Is there
an arm64 platform that actually implements yield?

Thanks,
Jan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ