lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1335584346-8070-1-git-send-email-wenqing.lz@taobao.com>
Date:	Sat, 28 Apr 2012 11:39:03 +0800
From:	Zheng Liu <gnehzuil.liu@...il.com>
To:	linux-ext4@...r.kernel.org
Cc:	Zheng Liu <wenqing.lz@...bao.com>
Subject: [RFC][PATCH 0/3] ext4: dio overwrite nolock

Hi list,

Currently, in ext4, write dio is serialized because i_mutex is locked in
generic_file_aio_write.  But, when we overwrite some data without changing
metadata, these dios can be parallelized.  So this patch set aims to make
overwrite dio paralleled.

When we overwrite some data, the metadata of this file doesn't need to be
modified.  Thus, we can try to lock i_data_sem directly to synchronized all
dio write operations.  First of all, a new wrapper function is defined instead
of genereic_file_aio_write in order to avoid to lock i_mutex.  Then we need to
define a new get_block function and a new flag for dio overwrite nolock feature
so that we can avoid nested lock and deadlock.  In ext4_map_blocks, i_data_sem
is acquired to do a lookup.  But after adding this new feature, this lock will
be acquired in high level.  Obviouslyi, here is a nested lock and we need to
avoid it.  Now, in ext4, we always start a new journal firstly, and then try to
acquire i_data_sem.  When we do a overwrite dio, journal doesn't need to be
created in order to avoid a deadlock.

In new wrapper function, called ext4_file_dio_write, it checks whether
conditions are satisfied or not.  If these are met, we lock i_data_sem directly
and parallelize all write operations.

In first patch, two functions are defined in order to split into buffered IO and
direct IO because we can keep buffered IO that still uses vfs path, and add new
feature into dio path.

In second patch, we add a new flag and a new function for get_block.  This
get_block function only does a lookup without any locks.

In last patch, dio overwrite nolock is added.  This feature also need to use
dioread_nolock option.  When a filesystem is mounted with dioread_nolock, this
feature is enabled.

I have run some benchmarks in my desktop to test this feature.  In my desktop,
it has a Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz, 4G memory and a Intel X-25
160G SSD.  I use fio to run my benchmarks and I compare dio overwrite nolock
with w/o dioread_nolock and w/ dioread_nolock.

= case 1 =
== config file ==
[global]
ioengine=psync
direct=1
bs=4k
size=32G
runtime=60
directory=/mnt/ext4/
filename=testfile
group_reporting
thread

[file1]
numjobs=1 # 4 8 16
rw=randwrite

== result (iops) ==
write			1	4	8	16
lock			7233	8612	9102	9165
dioread_nolock		8217	8228	8673	8755
diooverwrite_nolock	7740	15446	14563	17749

= case 2 =
== config file ==
[global]
ioengine=sync
direct=1
bs=4k
size=32G
runtime=60
directory=/mnt/ext4/
filename=testfile
group_reporting
thread

[file1]
numjobs=1 # 2 4 8
rw=randread

[file2]
numjobs=1 # 2 4 8
rw=randwrite

== result (iops) ==
read/write		2		4		8		16
lock			614/4343	1346/3124	1271/3930 	1386/3904
dioread_nolock		1040/1963	2162/1243	3980/1479 	13716/924
diooverwrite_nolock	1006/1913	1973/2602	3683/4515 	6966/7260

Regards,
Zheng

Zheng Liu (3):
      ext4: split ext4_file_write into buffered IO and direct IO
      ext4: add a new flag for ext4_map_blocks
      ext4: add dio overwrite nolock

 fs/ext4/ext4.h  |    2 +
 fs/ext4/file.c  |  200 +++++++++++++++++++++++++++++++++++++++++++++++++------
 fs/ext4/inode.c |   46 ++++++++++---
 3 files changed, 215 insertions(+), 33 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ