lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 15 Aug 2016 09:46:58 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Christoph Hellwig <hch@....de>
Cc:	Fengguang Wu <fengguang.wu@...el.com>,
	Ye Xiaolong <xiaolong.ye@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Bob Peterson <rpeterso@...hat.com>, LKP <lkp@...org>
Subject: Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

On Sun, Aug 14, 2016 at 06:17:24PM +0200, Christoph Hellwig wrote:
> Snipping the long contest:
> 
> I think there are three observations here:
> 
>  (1) removing the mark_page_accessed (which is the only significant
>      change in the parent commit)  hurts the
>      aim7/1BRD_48G-xfs-disk_rr-3000-performance/ivb44 test.
>      I'd still rather stick to the filemap version and let the
>      VM people sort it out.  How do the numbers for this test
>      look for XFS vs say ext4 and btrfs?
>  (2) lots of additional spinlock contention in the new case.  A quick
>      check shows that I fat-fingered my rewrite so that we do
>      the xfs_inode_set_eofblocks_tag call now for the pure lookup
>      case, and pretty much all new cycles come from that.
>  (3) Boy, are those xfs_inode_set_eofblocks_tag calls expensive, and
>      we're already doing way to many even without my little bug above.
> 
> So I've force pushed a new version of the iomap-fixes branch with
> (2) fixed, and also a little patch to xfs_inode_set_eofblocks_tag a
> lot less expensive slotted in before that.  Would be good to see
> the numbers with that.

With this new set of fixes, the 1byte write test runs ~30% faster on
my test machine (130k writes/s vs 100k writes/s), and the 1k write
on the pmem device runs about 10% faster (660MB/s vs 590MB/s).
dbench numbers on the pmem device also go through the roof (they
didn't show any regression to begin with) - 50% faster at 16 clients
on a 16AG filesystem (5700MB/s vs 3800MB/s).

The 10Mx4k file create fsmark workload I run (on the sparse 500TB
XFS filesystem backed by a pair of SSDs)  is giving the highest
throughput *and* the lowest std dev I've ever recorded
(55014.8+/-1.3e+04 files/s) and that shows in the runtime which also
drops from 3m57s to 3m22s.

So regardless of what aim7 results we get from these changes, I'll
be merging them pending review and further testing...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ