lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0807151326130.1976@cobra.newdream.net>
Date:	Tue, 15 Jul 2008 13:41:25 -0700 (PDT)
From:	Sage Weil <sage@...dream.net>
To:	"J. Bruce Fields" <bfields@...ldses.org>
Cc:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	ceph-devel@...ts.sf.net
Subject: Re: Recursive directory accounting for size, ctime, etc.

On Tue, 15 Jul 2008, J. Bruce Fields wrote:
> >  - There is some built-in delay before statistics fully propagate up 
> > toward the root of the hierarchy.  Changes are propagated 
> > opportunistically when lock/lease state allows, with an upper bound of (by 
> > default) ~30 seconds for each level of directory nesting.
> 
> That makes it less useful, e.g., for somebody with cached data trying to
> validate their cache, or for something like git trying to check a
> directory tree for changes.

Having fully up to date values would definitely be nice, but unfortunately 
doesn't play nice with the fact that different parts of the directory 
hierarchy may be managed by different metadata servers.  A primary goal in 
implementing this was to minimize any impact on performance.  The uses I 
had I mind were more in line with quota-based accounting than cache 
validation.

I think I can adjust the propagation heuristics/timeouts to make updates 
seem more or less immediate to a user in most cases, but that won't be 
sufficient for a tool like git that needs to reliably identify very recent 
updates.  For backup software wanting a consistent file system image, it 
should really be operating on a snapshot as well, in which case a delay 
between taking the snapshot and starting the scan for changes would allow 
those values to propagate.

> >  - Ceph internally distinguishes between multiple links to the same file 
> > (there is a single 'primary' link, and then zero or more 'remote' links).  
> > Only the primary link contributes toward the 'rbytes' total.
> 
> Is that only true for 'rbytes'?

The same goes for rctime.  As far as the recursive stats go, the other 
stats (file/directory counts) aren't affected.  The primary/remote 
hard link distinction is fundamental to the way metadata is internally 
managed and stored by the MDS, though, if that's what you mean (inode 
content is embedded with the primary link's directory metadata).

sage


> 
> --b.
> 
> > 
> >  - The 'rbytes' summation is over i_size, not blocks used.  That means 
> > sparse files "appear" larger than the storage space they actually consume.
> > 
> >  - Directories don't yet contribute anything to the 'rbytes' total.  They
> > should probably include an estimate of the storage consumed by directory 
> > metadata.  For this reason, and because the size isn't rounded up to the 
> > block size, the 'rbytes' total will usually be slightly smaller than what 
> > you get from 'du'.
> > 
> >  - Currently no stats for the root directory itself.
> > 
> > 
> > I'm extremely interested in what people think of overloading the file 
> > system interface in this way.  Handy?  Crufty?  Dangerous?  Does anybody 
> > know of any applications that rely on or expect meaningful values for a 
> > directory's i_size?  Or read() a directory?
> > 
> > 
> > More information on the recursive accounting at
> > 
> > 	http://ceph.newdream.net/wiki/Recursive_accounting
> > 
> > and Ceph itself at
> > 
> > 	http://ceph.newdream.net/
> > 
> > Cheers-
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ