lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 11 Sep 2012 22:27:14 +0800
From:	zwu.kernel@...il.com
To:	linux-fsdevel@...r.kernel.org
Cc:	linux-kernel@...r.kernel.org, dave@...ux.vnet.ibm.com,
	viro@...iv.linux.org.uk, hch@....de, chris.mason@...ionio.com,
	cmm@...ibm.com, linuxram@...ibm.com,
	aneesh.kumar@...ux.vnet.ibm.com,
	Zhi Yong Wu <wuzhy@...ux.vnet.ibm.com>
Subject: [RFC 00/11] VFS: hot data tracking

From: Zhi Yong Wu <wuzhy@...ux.vnet.ibm.com>

HI, folks
  I have pushed the patchset to my kernel dev git tree:
git@...hub.com:wuzhy/kernel.git

  Also, you can review it via
https://github.com/wuzhy/kernel/commits/hottrack

NOTE:

The patchset still has a lot of bugfix and cleanup to do. It is post
out mainly to make sure it is going in the correct direction and
hope to get some helpful comments from other guys.

TODO List:

 1.) Need to do scalability or performance tests.
 2.) Fix up bugs.
 3.) Strictly split this patchset to keep them in order
        This patchset is in RFC state, i haven't strictly split it
     When it is in PATCH state, i will strictly split it and let
     them in order.
 4.) Turn some Micro in to tunables
        TIME_TO_KICK, and HEAT_UPDATE_DELAY
 5.) Rafactor hot_hash_is_aging()
        If you just made the timeout value a timespec and compared
     the _timespecs_, you would be doing a lot fewer conversions.
 6.) Cleanup some unnecessary lock protect
 7.) Add more comments to explain how to calc temperature

Ben Chociej, Matt Lupfer and Conor Scott originally wrote this code to
 be very btrfs-specific.  I've taken their code and attempted to
make it more generic and integrate it at the VFS level.

INTRODUCTION:

Essentially, this means maintaining some key stats
(like number of reads/writes, last read/write time, frequency of
reads/writes), then distilling those numbers down to a single
"temperature" value that reflects what data is "hot," and using that
temperature to move data to SSDs.

The long-term goal of these patches is to allow some FSs,
e.g. Btrfs to intelligently utilize SSDs in a heterogenous volume.
Incidentally, this project has been motivated by
the Project Ideas page on the Btrfs wiki.

Of course, users are warned not to run this code outside of development
environments. These patches are EXPERIMENTAL, and as such they might eat
your data and/or memory. That said, the code should be relatively safe
when the hottrack mount option are disabled.

MOTIVATION:

The overall goal of enabling hot data relocation to SSD has been
motivated by the Project Ideas page on the Btrfs wiki at
<https://btrfs.wiki.kernel.org/index.php/Project_ideas>.
It will divide into two steps. VFS provide hot data tracking function
while specific FS will provide hot data relocation function.
So as the first step of this goal, it is hoped that the patchset
for hot data tracking will eventually mature into VFS.

This is essentially the traditional cache argument: SSD is fast and
expensive; HDD is cheap but slow. ZFS, for example, can already take
advantage of SSD caching. Btrfs should also be able to take advantage of
hybrid storage without many broad, sweeping changes to existing code.

SUMMARY:

- Hooks in existing vfs functions to track data access frequency

- New rbtrees for tracking access frequency of inodes and sub-file
ranges (hot_rb.c)
    The relationship between super_block and rbtree is as below:
  super_block->s_hotinfo.hot_inode_tree
    In include/linux/fs.h, one struct hot_info s_hotinfo is added to
  super_block struct. Each FS instance can find hot tracking info
  s_hotinfo via its super_block. In this hot_info, it store a lot of hot
  tracking info such as hot_inode_tree, inode and range hash list, etc.

- A hash list for indexing data by its temperature (hot_hash.c)

- A debugfs interface for dumping data from the rbtrees (hot_debugfs.c)

- A background kthread for updating inode heat info

- Mount options for enabling temperature tracking(-o hottrack, default mean disabled)
  (hot_track.c)

- An ioctl to retrieve the frequency information collected for a certain
file

- Ioctls to enable/disable frequency tracking per inode.

Usage syntax:

root@...ian-i386:~# mount -o hottrack /dev/sdb /mnt
[ 1505.894078] device label test devid 1 transid 29 /dev/sdb
[ 1505.952977] btrfs: disk space caching is enabled
[ 1506.069678] vfs: turning on hot data tracking
root@...ian-i386:~# mount -t debugfs none /sys/kernel/debug
root@...ian-i386:~# ls -l /sys/kernel/debug/vfs_hotdata/
total 0
drwxr-xr-x 2 root root 0 Aug  8 04:40 sdb
root@...ian-i386:~# ls -l /sys/kernel/debug/vfs_hotdata/sdb
total 0
-rw-r--r-- 1 root root 0 Aug  8 04:40 inode_data
-rw-r--r-- 1 root root 0 Aug  8 04:40 range_data
root@...ian-i386:~# vi /mnt/file
root@...ian-i386:~# cat /sys/kernel/debug/hot_track/sdb/inode_data
inode #279, reads 0, writes 1, avg read time 18446744073709551615,
avg write time 5251566408153596, temp 109
root@...ian-i386:~# cat /sys/kernel/debug/hot_track/sdb/range_data
inode #279, range start 0 (range len 1048576) reads 0, writes 1,
avg read time 18446744073709551615, avg write time 1128690176623144209, temp 64
root@...ian-i386:~# echo "hot data tracking test" >> /mnt/file
root@...ian-i386:~# cat /sys/kernel/debug/hot_track/sdb/inode_data
inode #279, reads 0, writes 2, avg read time 18446744073709551615,
avg write time 4923343766042451, temp 109
root@...ian-i386:~# cat /sys/kernel/debug/hot_track/sdb/range_data
inode #279, range start 0 (range len 1048576) reads 0, writes 2,
avg read time 18446744073709551615, avg write time 1058147040842596150, temp 64
root@...ian-i386:~#

Zhi Yong Wu (11):
  vfs: introduce one structure hot_info
  vfs: introduce one rb tree - hot_inode_tree
  vfs: introduce 2 rb tree items - inode and range
  vfs: add support for updating access frequency
  vfs: add one new mount option -o hottrack
  vfs: add init and exit support
  vfs: introduce one hash table
  vfs: enable hot data tracking
  vfs: fork one private kthread to update temperature info
  vfs: add 3 new ioctl interfaces
  vfs: add debugfs support

 fs/Makefile               |    3 +-
 fs/compat_ioctl.c         |    8 +
 fs/dcache.c               |    2 +
 fs/direct-io.c            |   10 +
 fs/hot_debugfs.c          |  488 ++++++++++++++++++++++++++++++++++
 fs/hot_debugfs.h          |   60 +++++
 fs/hot_hash.c             |  382 ++++++++++++++++++++++++++
 fs/hot_hash.h             |  112 ++++++++
 fs/hot_rb.c               |  648 +++++++++++++++++++++++++++++++++++++++++++++
 fs/hot_rb.h               |   81 ++++++
 fs/hot_track.c            |   85 ++++++
 fs/hot_track.h            |   23 ++
 fs/ioctl.c                |  132 +++++++++
 fs/namespace.c            |   10 +
 fs/super.c                |   11 +
 include/linux/fs.h        |   15 +
 include/linux/hot_track.h |  169 ++++++++++++
 mm/filemap.c              |    8 +
 mm/page-writeback.c       |   21 ++
 mm/readahead.c            |    9 +
 20 files changed, 2276 insertions(+), 1 deletions(-)
 create mode 100644 fs/hot_debugfs.c
 create mode 100644 fs/hot_debugfs.h
 create mode 100644 fs/hot_hash.c
 create mode 100644 fs/hot_hash.h
 create mode 100644 fs/hot_rb.c
 create mode 100644 fs/hot_rb.h
 create mode 100644 fs/hot_track.c
 create mode 100644 fs/hot_track.h
 create mode 100644 include/linux/hot_track.h

-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ