lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <117221D9-7634-4131-95C2-7527C20F1F62@dilger.ca>
Date:	Thu, 3 Oct 2013 18:28:10 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	T Makphaibulchoke <tmac@...com>
Cc:	Theodore Ts'o <tytso@....edu>,
	"linux-ext4@...r.kernel.org List" <linux-ext4@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	aswin@...com
Subject: Re: [PATCH 0/2] fs/ext4: increase parallelism in updating ext4 orphan list

On 2013-10-02, at 9:38 AM, T Makphaibulchoke wrote:
> Instead of allowing only a single atomic update (both in memory and on disk
> orphan lists) of an ext4's orphan list via the s_orphan_lock mutex, this patch allows multiple updates of the orphan list, while still maintaing the
> integrity of both the in memory and on disk orphan lists of each update.
> 
> This is accomplished by using a per inode mutex to serialize the oprhan
> list update of a single inode, and a mutex and a spinlock to serailize
> the on disk and in memory orphan list respectively.

It would also be possible to have a completely contention-free orphan
inode list by only generating the on-disk orphan linked list in a
pre-commit callback hook from an efficient in-memory list.  That would
allow the common "add to orphan list; do something; remove from list"
operations within a single transaction to run with minimal contention,
and only the few rare cases of operations that exceed the lifetime of
a single transaction would need to modify the on-disk list.

For example, a per-cpu list would be quite efficient, or a hash table.
Then, a jbd2 callback run before the transaction commits could modify
the requisite inodes and superblock.  All of those inodes are already
(by definition) part of the transaction, so it won't add new buffers
of the transaction.

I'm not necessarily against the current patch, just thinking aloud about
how it might be improved further.

Cheers, Andreas

> Here are some of the becnhmark results with the changes.
> 
> On a 90 core machine:
> 
> Here are the performance improvements in some of the aim7 workloads,
> 
> ---------------------------
> |             | % increase |
> ---------------------------
> | alltests    |      9.56  |
> ---------------------------
> | custom      |     12.20  |
> ---------------------------
> | fserver     |     15.99  |
> ---------------------------
> | new_dbase   |      1.73  |
> ---------------------------
> | new_fserver |     17.56  |
> ---------------------------
> | shared      |      6.24  |
> ---------------------------
> For Swingbench dss workload,
> 
> -------------------------------------------------------------------------
> | Users         | 100  | 200  | 300  | 400  | 500  | 600  | 700  | 800  |
> -------------------------------------------------------------------------
> | % imprvoment  | 7.67 | 9.43 | 7.30 | 0.58 | 0.53 |-2.62 |-3.72 | 3.77 |
> | without using |      |      |      |      |      |      |      |      |
> | shared memory |      |      |      |      |      |      |      |      |
> -------------------------------------------------------------------------
> 
> On a 8 core machine:
> 
> Here are the performance date from some of the aim7 workloads,
> 
> ---------------------------
> |             | % increase |
> ---------------------------
> | alltests    |      3.90  |
> ---------------------------
> | custom      |      1.66  |
> ---------------------------
> | dbase       |     -2.00  |
> ---------------------------
> | fserver     |      1.80  |
> ---------------------------
> | new_dbase   |     -1.90  |
> ---------------------------
> | new_fserver |      2.18  |
> ---------------------------
> | shared      |      7.46  |
> ---------------------------
> For Swingbench dss workload,
> 
> -------------------------------------------------------------------------
> | Users         | 100  | 200  | 300  | 400  | 500  | 600  | 700  | 800  |
> -------------------------------------------------------------------------
> | % imprvoment  |-1.32 | 6.45 | 1.18 |-3.13 |-1.13 | 4.68 | 5.75 |-0.37 |
> | without using |      |      |      |      |      |      |      |      |
> | shared memory |      |      |      |      |      |      |      |      |
> -------------------------------------------------------------------------
> 
> T Makphaibulchoke (2):
>  fs/ext4: adding and initalizing new members of ext4_inode_info and
>    ext4_sb_info
>  fs/ext4/namei.c: reducing contention on s_orphan_lock mmutex
> 
> fs/ext4/ext4.h  |   5 +-
> fs/ext4/inode.c |   1 +
> fs/ext4/namei.c | 139 ++++++++++++++++++++++++++++++++++++++++----------------
> fs/ext4/super.c |   4 +-
> 4 files changed, 108 insertions(+), 41 deletions(-)
> 
> -- 
> 1.7.11.3
> 


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ