lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 15 Jul 2008 11:28:22 -0700 (PDT)
From:	Sage Weil <sage@...dream.net>
To:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Cc:	ceph-devel@...ts.sf.net
Subject: Recursive directory accounting for size, ctime, etc.

All-

Ceph is a new distributed file system for Linux designed for scalability 
(terabytes to exabytes, tens to thousands of storage nodes), reliability, 
and performance.  The latest release (v0.3), aside from xattr support and 
the usual slew of bugfixes, includes a unique (?) recursive accounting 
infrastructure that allows statistics about all metadata nested beneath a 
point in the directory hierarchy to be efficiently propagated up the tree.  
Currently this includes a file and directory count, total bytes (summation 
over file sizes), and most recent inode ctime.  For example, for a 
directory like /home, Ceph can efficiently report the total number of 
files, directories, and bytes contained by that entire subtree of the 
directory hierarchy.

The file size summation is the most interesting, as it effectively gives 
you directory-based quota space accounting with fine granularity.  In many 
deployments, the quota _accounting_ is more important than actual 
enforcement.  Anybody who has had to figure out what has filled/is filling 
up a large volume will appreciate how cumbersome and inefficient 'du' can 
be for that purpose--especially when you're in a hurry.

There are currently two ways to access the recursive stats via a standard 
shell.  The first simply sets the directory st_size value to the 
_recursive_ bytes ('rbytes') value (when the client is mounted with -o 
rbytes).  For example (watch the directory sizes),

$ tar jxf linux-2.6.24.3.tar.bz2
$ ls -l
total 8
drwxr-xr-x 1 root root         0 Jul 10 05:30 .
drwxr-xr-x 8 root root      4096 Jul  9 18:21 ..
drwxrwxr-x 1 root root 254025660 Feb 26 00:20 linux-2.6.24.3
$ du -s linux-2.6.24.3/
254237  linux-2.6.24.3/
$ ls -al linux-2.6.24.3/
total 281
drwxrwxr-x 1 root root 254025660 Feb 26 00:20 .
drwxr-xr-x 1 root root         0 Jul 10 05:30 ..
-rw-rw-r-- 1 root root       628 Feb 26 00:20 .gitignore
-rw-rw-r-- 1 root root      3657 Feb 26 00:20 .mailmap
-rw-rw-r-- 1 root root     18693 Feb 26 00:20 COPYING
-rw-rw-r-- 1 root root     92230 Feb 26 00:20 CREDITS
drwxrwxr-x 1 root root   8984828 Feb 26 00:20 Documentation
-rw-rw-r-- 1 root root      1596 Feb 26 00:20 Kbuild
-rw-rw-r-- 1 root root     93957 Feb 26 00:20 MAINTAINERS
-rw-rw-r-- 1 root root     53162 Feb 26 00:20 Makefile
-rw-rw-r-- 1 root root     16930 Feb 26 00:20 README
-rw-rw-r-- 1 root root      3119 Feb 26 00:20 REPORTING-BUGS
drwxrwxr-x 1 root root  44216036 Feb 26 00:20 arch
drwxrwxr-x 1 root root    349137 Feb 26 00:20 block
drwxrwxr-x 1 root root    959654 Feb 26 00:20 crypto
drwxrwxr-x 1 root root 118578205 Feb 26 00:20 drivers
drwxrwxr-x 1 root root  21526882 Feb 26 00:20 fs
drwxrwxr-x 1 root root  27456604 Feb 26 00:20 include
drwxrwxr-x 1 root root     99077 Feb 26 00:20 init
drwxrwxr-x 1 root root    170827 Feb 26 00:20 ipc
drwxrwxr-x 1 root root   2189735 Feb 26 00:20 kernel
drwxrwxr-x 1 root root    679502 Feb 26 00:20 lib
drwxrwxr-x 1 root root   1213804 Feb 26 00:20 mm
drwxrwxr-x 1 root root  12562134 Feb 26 00:20 net
drwxrwxr-x 1 root root      3940 Feb 26 00:20 samples
drwxrwxr-x 1 root root   1105977 Feb 26 00:20 scripts
drwxrwxr-x 1 root root    740395 Feb 26 00:20 security
drwxrwxr-x 1 root root  12888682 Feb 26 00:20 sound
drwxrwxr-x 1 root root     16269 Feb 26 00:20 usr

Note that st_blocks is _not_ recursively defined, so 'du' still behaves as 
expected.  If mounted with -o norbytes instead, the directory st_size is 
the number of entries in the directory.

The second interface takes advantage of the fact (?) that read() on a 
directory is more or less undefined.  (Okay, that's not really true, but 
it used to return encoded dirents or something similar, and more recently 
returns -EISDIR.  As far as I know, no sane application expects meaningful 
data from read() on a directory...)  So, assuming Ceph is mounted with -o 
dirstat,

$ cat linux-2.6.24.3/
entries:                     27
 files:                       9
 subdirs:                    18
rentries:                 24418
 rfiles:                  23062
 rsubdirs:                 1356
rbytes:               254025660
rctime:    1215668428.051898000

Fields prefixed with 'r' are recursively defined, while 
entries/files/subdirs is just for the one directory.  'rctime' is the most 
recent ctime within the hierarchy, which should be useful for backup 
software or anything else scanning the hierarchy for recent changes.

Naturally, there are a few caveats:

 - There is some built-in delay before statistics fully propagate up 
toward the root of the hierarchy.  Changes are propagated 
opportunistically when lock/lease state allows, with an upper bound of (by 
default) ~30 seconds for each level of directory nesting.

 - Ceph internally distinguishes between multiple links to the same file 
(there is a single 'primary' link, and then zero or more 'remote' links).  
Only the primary link contributes toward the 'rbytes' total.

 - The 'rbytes' summation is over i_size, not blocks used.  That means 
sparse files "appear" larger than the storage space they actually consume.

 - Directories don't yet contribute anything to the 'rbytes' total.  They
should probably include an estimate of the storage consumed by directory 
metadata.  For this reason, and because the size isn't rounded up to the 
block size, the 'rbytes' total will usually be slightly smaller than what 
you get from 'du'.

 - Currently no stats for the root directory itself.


I'm extremely interested in what people think of overloading the file 
system interface in this way.  Handy?  Crufty?  Dangerous?  Does anybody 
know of any applications that rely on or expect meaningful values for a 
directory's i_size?  Or read() a directory?


More information on the recursive accounting at

	http://ceph.newdream.net/wiki/Recursive_accounting

and Ceph itself at

	http://ceph.newdream.net/

Cheers-
sage
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ