lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 4 Sep 2013 14:00:44 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	T Makphaibulchoke <tmac@...com>
Cc:	Theodore Ts'o <tytso@....edu>, Al Viro <viro@...iv.linux.org.uk>,
	"linux-ext4@...r.kernel.org List" <linux-ext4@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org Devel" <linux-fsdevel@...r.kernel.org>,
	aswin@...com, Linus Torvalds <torvalds@...ux-foundation.org>,
	aswin_proj@...ups.hp.com
Subject: Re: [PATCH v3 0/2] ext4: increase mbcache scalability

On 2013-09-04, at 10:39 AM, T Makphaibulchoke wrote:
> This patch intends to improve the scalability of an ext filesystem,
> particularly ext4.

In the past, I've raised the question of whether mbcache is even
useful on real-world systems.  Essentially, this is providing a
"deduplication" service for ext2/3/4 xattr blocks that are identical.
The question is how often this is actually the case in modern use?
The original design was for allowing external ACL blocks to be
shared between inodes, at a time when ACLs where pretty much the
only xattrs stored on inodes.

The question now is whether there are common uses where all of the
xattrs stored on multiple inodes are identical?  If that is not the
case, mbcache is just adding overhead and should just be disabled
entirely instead of just adding less overhead.

There aren't good statistics on the hit rate for mbcache, but it
might be possible to generate some with systemtap or similar to
see how often ext4_xattr_cache_find() returns NULL vs. non-NULL.

Cheers, Andreas

> The patch consists of two parts.  The first part introduces higher
> degree of parallelism to the usages of the mb_cache and
> mb_cache_entries and impacts all ext filesystems.
> 
> The second part of the patch further increases the scalablity of
> an ext4 filesystem by having each ext4 fielsystem allocate and use
> its own private mbcache structure, instead of sharing a single
> mcache structures across all ext4 filesystems
> 
> Here are some of the benchmark results with the changes. 
> 
> On a 90 core machine:
> 
> Here are the performance improvements in some of the aim7 workloads,
> 
> ---------------------------
> |             | % increase |
> ---------------------------
> | alltests    |     11.85  |
> ---------------------------
> | custom      |     14.42  |
> ---------------------------
> | fserver     |     21.36  |  
> ---------------------------
> | new_dbase   |      5.59 |  
> ---------------------------
> | new_fserver |     21.45  |  
> ---------------------------
> | shared      |     12.84  |  
> ---------------------------
> For Swingbench dss workload, with 16 GB database,
> 
> -------------------------------------------------------------------------------
> | Users        | 100  | 200  | 300  | 400  | 500  | 600  | 700  | 800  | 900  |
> -------------------------------------------------------------------------------
> | % imprvoment | 8.46 | 8.00 | 7.35 | -.313| 1.09 | 0.69 | 0.30 | 2.18 | 5.23 |
> -------------------------------------------------------------------------------
> | % imprvoment |45.66 |47.62 |34.54 |25.15 |15.29 | 3.38 | -8.7 |-4.98 |-7.86 |
> | without using|      |      |      |      |      |      |      |      |      |
> | shared memory|      |      |      |      |      |      |      |      |      |
> -------------------------------------------------------------------------------
> For SPECjbb2013, composite run,
> 
> --------------------------------------------
> |               | max-jOPS | critical-jOPS |
> --------------------------------------------
> | % improvement |   5.99   |     N/A       |
> --------------------------------------------
> 
> 
> On an 80 core machine:
> 
> The aim7's results for most of the workloads turn out to the same.
> 
> Here are the results of Swingbench dss workload,
> 
> -------------------------------------------------------------------------------
> | Users        | 100  | 200  | 300  | 400  | 500  | 600  | 700  | 800  | 900  |
> -------------------------------------------------------------------------------
> | % imprvoment |-1.79 | 0.37 | 1.36 | 0.08 | 1.66 | 2.09 | 1.16 | 1.48 | 1.92 |
> -------------------------------------------------------------------------------
> 
> The changes have been tested with ext4 xfstests to verify that no regression
> has been introduced. 
> 
> Changed in v3:
> 	- New diff summary
> 
> Changed in v2:
> 	- New performance data
> 	- New diff summary
> 
> T Makphaibulchoke (2):
>  mbcache: decoupling the locking of local from global data
>  ext4: each filesystem creates and uses its own mb_cache
> 
> fs/ext4/ext4.h          |   1 +
> fs/ext4/super.c         |  24 ++--
> fs/ext4/xattr.c         |  51 ++++----
> fs/ext4/xattr.h         |   6 +-
> fs/mbcache.c            | 306 +++++++++++++++++++++++++++++++++++-------------
> include/linux/mbcache.h |  10 +-
> 6 files changed, 277 insertions(+), 121 deletions(-)
> 
> -- 
> 1.7.11.3
> 


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ