linux-kernel - [4.2, Regression] Queued spinlocks cause major XFS performance regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150904054820.GY3902@dastard>
Date:	Fri, 4 Sep 2015 15:48:20 +1000
From:	Dave Chinner <david@...morbit.com>
To:	linux-kernel@...r.kernel.org
Cc:	peterz@...radead.org, Waiman.Long@...com,
	torvalds@...ux-foundation.org
Subject: [4.2, Regression] Queued spinlocks cause major XFS performance
 regression

Hi Waiman,

For the first time in months I just turned of spinlock debugging on
my performance test machine and I just got an unpleasant surprise on
my standard inode allocation and reclaim test.  I've described this
test to you before, because it's found regressions in your previous
lock scaling changes:

http://permalink.gmane.org/gmane.linux.kernel/1768786

This time it is the fsmark run that I use to populate the filesystem
that is demonstrating a locking regression. I'll asked you before
if you could add this test to your lock scaling regression test
suite; please do it this time.

Now, the regression.  With spinlock debugging turned on, the
performance of my usual XFS inode allocation benchmark using fsmark
reports performance like this:

FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0     312594.0          9944159
     0      3200000            0     295668.6         10399679
     0      4800000            0     279026.1         11397617
.....

This has been pretty stable for several releases - it varies +/- a
few percent, but it's pretty much been like this since about 3.2
when CONFIG_XFS_DEBUG=n, with or without basic spinlock debugging.

When I turned spinlock debugging off on 4.2 to get some perf numbers
a request from Linus, I got this:

FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0     114143.9          9597599
     0      3200000            0      95486.9          9460413
     0      4800000            0      93918.2          9784699
....

All 16 CPUs were pegged at 100% cpu usage. I took a quick look at
perf:

  67.32%  [kernel]       [k] queued_spin_lock_slowpath
   5.17%  [kernel]       [k] xfs_log_commit_cil
   2.47%  [kernel]       [k] _xfs_buf_find
   1.37%  [kernel]       [k] _raw_spin_lock
   ....

And then a quick call graph sample to find the lock:

   37.19%    37.19%  [kernel]         [k] queued_spin_lock_slowpath
   - queued_spin_lock_slowpath
      - 99.98% _raw_spin_lock
         - 89.16% xfs_log_commit_cil
            - __xfs_trans_commit
               - 98.48% xfs_trans_commit
                    xfs_create
                    xfs_generic_create
                    xfs_vn_mknod
                    xfs_vn_create
                    vfs_create
                    path_openat
                    do_filp_open
                    do_sys_open
                    sys_open
                    entry_SYSCALL_64_fastpath
                  + __GI___libc_open
               + 1.52% __xfs_trans_roll

This shows that we have catastrophic spinlock contention in the
transaction commit path. The cil->xc_cil_lock spin lock as it's the
only spinlock in that path. And while it's the hot lock in the
commit path, turning spinlock debugging back on (and no other
changes) shows that it shouldn't be contended:

   8.92%  [kernel]  [k] _xfs_buf_find
   5.51%  [kernel]  [k] xfs_dir2_node_addname
   3.49%  [kernel]  [k] xfs_dir3_free_hdr_from_disk
   3.45%  [kernel]  [k] do_raw_spin_lock
   3.06%  [kernel]  [k] kmem_cache_alloc
   2.99%  [kernel]  [k] __memcpy
   2.97%  [kernel]  [k] xfs_log_commit_cil
   .....

And the call graph:

    4.52%     0.24%  [kernel]            [k] _raw_spin_lock
   - _raw_spin_lock
      + 43.63% xfs_log_commit_cil
      + 9.76% inode_sb_list_add
      + 8.57% list_lru_add
      + 6.37% _xfs_buf_find
      + 6.08% d_alloc
      + 4.18% dput
      + 4.08% __d_instantiate
...

IOWs, about 2% of the cpu usage is lock traffic through
cil->xc_cil_lock, and that's 3x the locking rate where the queued
spinlock becomes catastrophically contention bound.

To confirm that this is indeed caused by the queued spinlocks, I
removed the the spinlock debugging and did this to arch/x86/Kconfig:

-       select ARCH_USE_QUEUED_SPINLOCK

And the results are:

FSUse%        Count         Size    Files/sec     App Overhead
     0      1600000            0     329310.4          9727415
     0      3200000            0     305421.4         10421358
     0      4800000            0     294593.3         11007112
     0      6400000            0     283863.7         12557190
....

So the problem is definitely the queued spinlock...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/