linux-kernel - Re: [4.2, Regression] Queued spinlocks cause major XFS performance regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150904082954.GB3902@dastard>
Date:	Fri, 4 Sep 2015 18:29:54 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Waiman Long <Waiman.Long@...com>,
	Ingo Molnar <mingo@...nel.org>
Subject: Re: [4.2, Regression] Queued spinlocks cause major XFS performance
 regression

On Fri, Sep 04, 2015 at 05:11:43PM +1000, Dave Chinner wrote:
> On Thu, Sep 03, 2015 at 11:39:21PM -0700, Linus Torvalds wrote:
> > There doesn't seem to be anything even remotely strange going on in that area.
> > 
> > Is this a PARAVIRT configuration? There were issues with PV
> > interaction at some point. If it is PV, and you don't actually use PV,
> > can you test with PV support disabled?
> 
> $ grep PARAVIRT .config
> CONFIG_PARAVIRT=y
> # CONFIG_PARAVIRT_DEBUG is not set
> # CONFIG_PARAVIRT_SPINLOCKS is not set
> CONFIG_PARAVIRT_TIME_ACCOUNTING=y
> CONFIG_PARAVIRT_CLOCK=y
> $
> 
> I'll retest with CONFIG_PARAVIRT=n....

$ grep PARAVIRT .config
# CONFIG_PARAVIRT is not set
$


FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0     123407.7          9202289
     0      3200000            0      97271.9          9187905
     0      4800000            0     101010.3         11246527
....

So, no, that doesn't affect the queued spinlock performance at all.

> > Also, if you look at the instruction-level profile for
> > queued_spin_lock_slowpath itself, does anything stand out? For
> > example, I note that the for-loop with the atomic_cmpxchg() call in it
> > doesn't ever do a cpu_relax(). It doesn't look like that should
> > normally loop, but obviously that function also shouldn't normally use
> > 2/3rds of the cpu, so.. Maybe some part of queued_spin_lock_slowpath()
> > stands out as "it's spending 99% of the time in _that_ particular
> > part, and it gives some clue what goes wrong.
> 
> I'll have a look when the current tests on that machine have
> finished running.

      ¿     Disassembly of section load2:
       ¿
       ¿     ffffffff810e0f30 <load2+0xe0f30>:
  0.00 ¿       nop
       ¿       push   %rbp
  0.00 ¿       mov    %rsp,%rbp
  0.00 ¿       xchg   %ax,%ax
       ¿       xor    %eax,%eax
  0.00 ¿       mov    $0x1,%edx
       ¿       lock   cmpxchg %edx,(%rdi)
  0.33 ¿       xor    %ecx,%ecx
       ¿       test   %eax,%eax
       ¿     ¿ je     28
  0.02 ¿ 1c:   pause
  4.45 ¿       mov    %ecx,%eax
  0.00 ¿       lock   cmpxchg %edx,(%rdi)
 95.18 ¿       test   %eax,%eax
       ¿     ¿ jne    1c
  0.01 ¿ 28:   pop    %rbp
  0.01 ¿     ¿ retq

.....

It looks like it's spending all it's time looping around the cmpxchg.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/