lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 29 Apr 2019 14:52:11 +0000 From: Jan Glauber <jglauber@...vell.com> To: "catalin.marinas@....com" <catalin.marinas@....com>, "will.deacon@....com" <will.deacon@....com> CC: "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Jayachandran Chandrasekharan Nair <jnair@...vell.com> Subject: [RFC] Disable lockref on arm64 Hi Catalin & Will, I've been looking into performance issues that were reported for several test-cases, for instance an nginx benchmark. It turned out the issue we have on ThunderX2 is the file open-close sequence with small read sizes. If the used files are opened read-only the lockref code (enabled by ARCH_USE_CMPXCHG_LOCKREF) is used. The lockref CMPXCHG_LOOP uses an unbound (as long as the associated spinlock isn't taken) while loop to change the lock count. This behaves badly under heavy contention (~25x retries for one cmpxchg to succeed with 28 threads operating on the same file). In case of a NUMA system it also behaves badly as the access from the other socket is much slower. The fact that on ThunderX2 cpu_relax() turns only into one NOP instruction doesn't help either. On Intel pause seems to block the thread much longer, avoiding the heavy contention thereby. With the queued spinlocks implementation I can see a major improvement when I disable lockref. A trivial open-close test-case improves by factor 2 while system time is decreasing also 2x. Looking at kernel compile and dbench numbers didn't show any regression with lockref disabled. Can we simply disable lockref? Is anyone else seeing this issue? Is there an arm64 platform that actually implements yield? Thanks, Jan
Powered by blists - more mailing lists