lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1383745544-391-1-git-send-email-zwu.kernel@gmail.com>
Date:	Wed,  6 Nov 2013 21:45:33 +0800
From:	Zhi Yong Wu <zwu.kernel@...il.com>
To:	viro@...iv.linux.org.uk
Cc:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	Zhi Yong Wu <wuzhy@...ux.vnet.ibm.com>
Subject: [PATCH v6 00/11] VFS hot tracking

From: Zhi Yong Wu <wuzhy@...ux.vnet.ibm.com>

  The patchset is trying to introduce hot tracking function in
VFS layer, which will keep track of real disk I/O in memory.
By it, you will easily know more details about disk I/O, and
then detect where disk I/O hot spots are. Also, specific FS
can take use of it to do accurate defragment, and hot relocation
support, etc.

  Now it's time to send out its V6 for external review, and
any comments or ideas are appreciated, thanks.

NOTE:

  The patchset can be obtained via my kernel dev git on github:
git://github.com/wuzhy/kernel.git hot_tracking
  If you're interested, you can also review them via
https://github.com/wuzhy/kernel/commits/hot_tracking

  For how to use and more other info and performance report,
please check hot_tracking.txt in Documentation and following
links:
  1.) http://lwn.net/Articles/525651/
  2.) https://lkml.org/lkml/2012/12/20/199

  This patchset has been done scalability or performance tests
by fs_mark, ffsb and compilebench.

  The perf testings were done on Linux 3.12.0-rc7 with Model IBM,8231-E2C
Big Endian PPC64 with 64 CPUs and 2 NUMA nodes, 250G RAM and 1.50 TiB
test hard disk where each test file size is 20G or 100G.
Architecture:          ppc64
Byte Order:            Big Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             16
NUMA node(s):          2
Model:                 IBM,8231-E2C
Hypervisor vendor:     pHyp
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-31
NUMA node1 CPU(s):     32-63

  Below is the perf testing report:

  Please focus on the two key points:
  - The overall overhead which is injected by the patchset
  - The stability of the perf results

1. fio tests

                            w/o hot tracking                               w/ hot tracking

RAM size                            32G          32G         16G           8G           4G           2G          250G  

sequential-8k-1jobs-read         61260KB/s    60918KB/s    60901KB/s    62610KB/s    60992KB/s    60213KB/s    60948KB/s

sequential-8k-1jobs-write         1329KB/s     1329KB/s     1328KB/s     1329KB/s     1328KB/s     1329KB/s     1329KB/s

sequential-8k-8jobs-read         91139KB/s    92614KB/s    90907KB/s    89895KB/s    92022KB/s    90851KB/s    91877KB/s

sequential-8k-8jobs-write         2523KB/s     2522KB/s     2516KB/s     2521KB/s     2516KB/s     2518KB/s     2521KB/s

sequential-256k-1jobs-read      151432KB/s   151403KB/s   151406KB/s   151422KB/s   151344KB/s   151446KB/s   151372KB/s

sequential-256k-1jobs-write      33451KB/s    33470KB/s    33481KB/s    33470KB/s    33459KB/s    33472KB/s    33477KB/s

sequential-256k-8jobs-read      235291KB/s   234555KB/s   234251KB/s   233656KB/s   234927KB/s   236380KB/s   235535KB/s

sequential-256k-8jobs-write      62419KB/s    62402KB/s    62191KB/s    62859KB/s    62629KB/s    62720KB/s    62523KB/s

random-io-mix-8k-1jobs  [READ]    2929KB/s     2942KB/s     2946KB/s     2929KB/s     2934KB/s     2947KB/s     2946KB/s
                        [WRITE]   1262KB/s     1266KB/s     1257KB/s     1262KB/s     1257KB/s     1257KB/s     1265KB/s

random-io-mix-8k-8jobs  [READ]    2444KB/s     2442KB/s     2436KB/s     2416KB/s     2353KB/s     2441KB/s     2442KB/s
                        [WRITE]   1047KB/s     1044KB/s     1047KB/s     1028KB/s     1017KB/s     1034KB/s     1049KB/s

random-io-mix-8k-16jobs [READ]    2182KB/s     2184KB/s     2169KB/s     2178KB/s     2190KB/s     2184KB/s     2180KB/s
                        [WRITE]    932KB/s      930KB/s      943KB/s      936KB/s      937KB/s      929KB/s      931KB/s

The above perf parameter is the aggregate bandwidth of threads in the group;
If you hope to know how about other perf parameters, or fio raw results, please let me know, thanks.

2. Locking stat - Contention & Cacheline Bouncing

RAM size         class name         con-bounces  contentions  acq-bounces   acquisitions   cacheline bouncing  locking contention
                                                                                                 ratio              ratio

              &(&root->t_lock)->rlock:  1508        1592         157834      374639292           0.96%              0.00%
250G          &(&root->m_lock)->rlock:  1469        1484         119221       43077842           1.23%              0.00%
              &(&he->i_lock)->rlock:       0           0         101879      376755218           0.00%              0.00%

              &(&root->t_lock)->rlock:  2912        2985         342575      374691186           0.85%              0.00%
32G           &(&root->m_lock)->rlock:   188         193         307765        8803163           0.00%              0.00%
              &(&he->i_lock)->rlock:       0           0         291860      376756084           0.00%              0.00%

              &(&root->t_lock)->rlock:  3863        3948         298041      374727038           1.30%              0.00%
16G           &(&root->m_lock)->rlock:   220         228         254451        8687057           0.00%              0.00%
              &(&he->i_lock)->rlock:       0           0         235027      376756830           0.00%              0.00%

              &(&root->t_lock)->rlock:  3283        3409         233790      374722064           1.40%              0.00%
8G            &(&root->m_lock)->rlock:   136         139         203917        8684313           0.00%              0.00%
              &(&he->i_lock)->rlock:       0           0         193746      376756438           0.00%              0.00%

              &(&root->t_lock)->rlock: 15090       15705         283460      374889666           5.32%              0.00%
4G            &(&root->m_lock)->rlock:   172         173         222480        8555052           0.00%              0.00%
              &(&he->i_lock)->rlock:       0           0         206431      376759452           0.00%              0.00%

              &(&root->t_lock)->rlock: 25515       27368         305129       375394828          8.36%              0.00% 
2G            &(&root->m_lock)->rlock:   100         101         216516        6752265           0.00%              0.00%
              &(&he->i_lock)->rlock:       0           0         214713      376765169           0.00%              0.00%

3. Perf test - Cacheline Ping-pong

                      w/o hot tracking                                                        w/ hot tracking

RAM size                    32G                  32G                 16G                  8G                   4G                    2G                  250G  

cache-references    1,264,996,437,581    1,401,504,955,577    1,398,308,614,801    1,396,525,544,527    1,384,793,467,410    1,432,042,560,409    1,571,627,148,771

cache-misses           45,424,567,057       58,432,749,807       59,200,504,032       59,762,030,933       58,104,156,576       57,283,962,840       61,963,839,419

seconds time elapsed  22956.327674298      23035.457069488      23017.232397085      23012.397142967      23008.420970731      23057.245578767      23342.456015188

cache-misses ratio            3.591 %              4.169 %              4.234 %              4.279 %              4.196 %              4.000 %              3.943 %

Changelog from v5:
 - Also added the hook hot_freqs_update() in the page cache I/O path,
   not only in real disk I/O path [viro]
 - Don't export the stuff until it's used by a module [viro]
 - Splitted hot_inode_item_lookup() [viro]
 - Prevented hot items from being re-created after the inode was unlinked. [viro]
 - Made hot_freqs_update() to be inline and adopt one private hot flag [viro]
 - Killed hot_bit_shift() [viro]
 - Used file_inode() instead of file->f_dentry->d_inode [viro]
 - Introduced one new file hot_tracking.h in include/uapi/linux/ [viro]
 - Made the checks for ->i_nlink to be protectd by ->i_mutex [viro]

v5:
 - Added all kinds of perf testing report [viro]
 - Covered mmap() now [viro]
 - Removed list_sort() in hot_update_worker() to avoid locking contention
   and cacheline bouncing [viro]
 - Removed a /proc interface to control low memory usage [Chandra]
 - Adjusted shrinker support due to the change of public shrinker APIs [zwu]
 - Fixed the locking missing issue when hot_inode_item_put() is called
   in ioctl_heat_info() [viro]
 - Fixed some locking contention issues [zwu]

v4:
 - Removed debugfs support, but leave it to TODO list [viro, Chandra]
 - Killed HOT_DELETING and HOT_IN_LIST flag [viro]
 - Fixed unlink issues [viro]
 - Fixed the issue on lookups (both for inode and range)
   leak on race with unlink  [viro]
 - Killed hot_comm_item and split the functions which take it [virio]
 - Fixed some other issues [zwu, Chandra]

v3:
 - Added memory caping function for hot items [Zhiyong]
 - Cleanup aging function [Zhiyong]

v2:
 - Refactored to be under RCU [Chandra Seetharaman]
  Merged some code changes [Chandra Seetharaman]
 - Fixed some issues [Chandra Seetharaman]

v1:
 - Solved 64 bits inode number issue. [David Sterba]
 - Embed struct hot_type in struct file_system_type [Darrick J. Wong]
 - Cleanup Some issues [David Sterba]
 - Use a static hot debugfs root [Greg KH]

rfcv4:
 - Introduce hot func registering framework [Zhiyong]
 - Remove global variable for hot tracking [Zhiyong]
 - Add btrfs hot tracking support [Zhiyong]

rfcv3:
 1.) Rewritten debugfs support based seq_file operation. [Dave Chinner]
 2.) Refactored workqueue support. [Dave Chinner]
 3.) Turn some Micro into be tunable [Zhiyong, Liu Zheng]
     TIME_TO_KICK, and HEAT_UPDATE_DELAY
 4.) Cleanedup a lot of other issues [Dave Chinner]


rfcv2:
 1.) Converted to Radix trees, not RB-tree [Zhiyong, Dave Chinner]
 2.) Added memory shrinker [Dave Chinner]
 3.) Converted to one workqueue to update map info periodically [Dave Chinner]
 4.) Cleanedup a lot of other issues [Dave Chinner]

rfcv1:
 1.) Reduce new files and put all in fs/hot_tracking.[ch] [Dave Chinner]
 2.) The first three patches can probably just be flattened into one.
                                        [Marco Stornelli , Dave Chinner]


Dave Chinner (1):
  VFS hot tracking, xfs: Add hot tracking support

Zhi Yong Wu (10):
  VFS hot tracking: Define basic data structures and functions
  VFS hot tracking: Track IO and record heat information
  VFS hot tracking: Add a workqueue to move items between hot maps
  VFS hot tracking: Add shrinker functionality to curtail memory usage
  VFS hot tracking: Add an ioctl to get hot tracking information
  VFS hot tracking: Add a /proc interface to make the interval tunable
  VFS hot tracking: Add a /proc interface to control memory usage
  VFS hot tracking: Add documentation
  VFS hot tracking, btrfs: Add hot tracking support
  MAINTAINERS: add the maintainers for VFS hot tracking

 Documentation/filesystems/00-INDEX         |   2 +
 Documentation/filesystems/hot_tracking.txt | 207 ++++++++
 MAINTAINERS                                |  12 +
 fs/Makefile                                |   2 +-
 fs/btrfs/ctree.h                           |   1 +
 fs/btrfs/super.c                           |  22 +-
 fs/compat_ioctl.c                          |   5 +
 fs/dcache.c                                |   2 +
 fs/hot_tracking.c                          | 816 +++++++++++++++++++++++++++++
 fs/hot_tracking.h                          |  72 +++
 fs/ioctl.c                                 |  71 +++
 fs/namei.c                                 |   4 +
 fs/xfs/xfs_mount.h                         |   1 +
 fs/xfs/xfs_super.c                         |  18 +
 include/linux/fs.h                         |   4 +
 include/linux/hot_tracking.h               | 107 ++++
 include/uapi/linux/fs.h                    |   1 +
 include/uapi/linux/hot_tracking.h          |  33 ++
 kernel/sysctl.c                            |  14 +
 mm/filemap.c                               |  24 +-
 mm/readahead.c                             |   6 +
 21 files changed, 1420 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/filesystems/hot_tracking.txt
 create mode 100644 fs/hot_tracking.c
 create mode 100644 fs/hot_tracking.h
 create mode 100644 include/linux/hot_tracking.h
 create mode 100644 include/uapi/linux/hot_tracking.h

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ