lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANGUGtA=TsvorCDhPCz9=YuCw5jso-OW1uz7D=+3v92deuznGg@mail.gmail.com>
Date:	Mon, 17 Sep 2012 11:45:04 +0200
From:	Marco Stornelli <marco.stornelli@...il.com>
To:	zwu.kernel@...il.com
Cc:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	linuxram@...ux.vnet.ibm.com, viro@...iv.linux.org.uk,
	cmm@...ibm.com, tytso@....edu,
	Zhi Yong Wu <wuzhy@...ux.vnet.ibm.com>
Subject: Re: [RFC v1 00/11] vfs: hot data tracking

2012/9/17  <zwu.kernel@...il.com>:
> From: Zhi Yong Wu <wuzhy@...ux.vnet.ibm.com>
>
> NOTE:
>
>   The patchset is currently post out mainly to make sure
> it is going in the correct direction and hope to get some
> helpful comments from other guys.
>
> TODO List:
>
>  1.) Need to do scalability or performance tests.
>  2.) Turn some Micro into tunables
>        TIME_TO_KICK, and HEAT_UPDATE_DELAY
>  3.) Rafactor hot_hash_is_aging()
>        If you just made the timeout value a timespec and compared
>      the _timespecs_, you would be doing a lot fewer conversions.
>  4.) Cleanup some unnecessary lock protect
>  5.) Add more comments to explain how to calc temperature
>
> Ben Chociej, Matt Lupfer and Conor Scott originally wrote this code to
>  be very btrfs-specific.  I've taken their code and attempted to
> make it more generic and integrate it at the VFS level.
>
> INTRODUCTION:
>
>   Essentially, this means maintaining some key stats
> (like number of reads/writes, last read/write time, frequency of
> reads/writes), then distilling those numbers down to a single
> "temperature" value that reflects what data is "hot," and using that
> temperature to move data to SSDs.
>
>   The long-term goal of these patches is to allow some FSs,
> e.g. Btrfs to intelligently utilize SSDs in a heterogenous volume.
> Incidentally, this project has been motivated by
> the Project Ideas page on the Btrfs wiki.
>
>   Of course, users are warned not to run this code outside of development
> environments. These patches are EXPERIMENTAL, and as such they might eat
> your data and/or memory. That said, the code should be relatively safe
> when the hottrack mount option are disabled.
>
> MOTIVATION:
>
>   The overall goal of enabling hot data relocation to SSD has been
> motivated by the Project Ideas page on the Btrfs wiki at
> <https://btrfs.wiki.kernel.org/index.php/Project_ideas>.
> It will divide into two steps. VFS provide hot data tracking function
> while specific FS will provide hot data relocation function.
> So as the first step of this goal, it is hoped that the patchset
> for hot data tracking will eventually mature into VFS.
>
>   This is essentially the traditional cache argument: SSD is fast and
> expensive; HDD is cheap but slow. ZFS, for example, can already take
> advantage of SSD caching. Btrfs should also be able to take advantage of
> hybrid storage without many broad, sweeping changes to existing code.
>
> SUMMARY:
>
> - Hooks in existing vfs functions to track data access frequency
>
> - New rbtrees for tracking access frequency of inodes and sub-file
> ranges (hot_rb.c)
>     The relationship between super_block and rbtree is as below:
>   super_block->s_hotinfo.hot_inode_tree
>     In include/linux/fs.h, one struct hot_info s_hotinfo is added to
>   super_block struct. Each FS instance can find hot tracking info
>   s_hotinfo via its super_block. In this hot_info, it store a lot of hot
>   tracking info such as hot_inode_tree, inode and range hash list, etc.
>
> - A hash list for indexing data by its temperature (hot_hash.c)
>
> - A debugfs interface for dumping data from the rbtrees (hot_debugfs.c)
>
> - A background kthread for updating inode heat info
>
> - Mount options for enabling temperature tracking(-o hottrack, default mean disabled)
>   (hot_track.c)
>
> - An ioctl to retrieve the frequency information collected for a certain
> file
>
> - Ioctls to enable/disable frequency tracking per inode.
>
> Usage syntax:
>
> root@...ian-i386:~# mount -o hottrack /dev/sdb /mnt
> [ 1505.894078] device label test devid 1 transid 29 /dev/sdb
> [ 1505.952977] btrfs: disk space caching is enabled
> [ 1506.069678] vfs: turning on hot data tracking
> root@...ian-i386:~# mount -t debugfs none /sys/kernel/debug
> root@...ian-i386:~# ls -l /sys/kernel/debug/vfs_hotdata/
> total 0
> drwxr-xr-x 2 root root 0 Aug  8 04:40 sdb
> root@...ian-i386:~# ls -l /sys/kernel/debug/vfs_hotdata/sdb
> total 0
> -rw-r--r-- 1 root root 0 Aug  8 04:40 inode_data
> -rw-r--r-- 1 root root 0 Aug  8 04:40 range_data
> root@...ian-i386:~# vi /mnt/file
> root@...ian-i386:~# cat /sys/kernel/debug/hot_track/sdb/inode_data
> inode #279, reads 0, writes 1, avg read time 18446744073709551615,
> avg write time 5251566408153596, temp 109
> root@...ian-i386:~# cat /sys/kernel/debug/hot_track/sdb/range_data
> inode #279, range start 0 (range len 1048576) reads 0, writes 1,
> avg read time 18446744073709551615, avg write time 1128690176623144209, temp 64
> root@...ian-i386:~# echo "hot data tracking test" >> /mnt/file
> root@...ian-i386:~# cat /sys/kernel/debug/hot_track/sdb/inode_data
> inode #279, reads 0, writes 2, avg read time 18446744073709551615,
> avg write time 4923343766042451, temp 109
> root@...ian-i386:~# cat /sys/kernel/debug/hot_track/sdb/range_data
> inode #279, range start 0 (range len 1048576) reads 0, writes 2,
> avg read time 18446744073709551615, avg write time 1058147040842596150, temp 64
> root@...ian-i386:~#
>

It's a good idea to add a new file under documentation and include
this kind of information. For example what temp means, how it's worked
out and how to "read" the avg read/write time (nanoseconds,
microseconds, jiffies....??)

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ