[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220510230447.GC2306852@dread.disaster.area>
Date: Wed, 11 May 2022 09:04:47 +1000
From: Dave Chinner <david@...morbit.com>
To: Florian Weimer <fweimer@...hat.com>
Cc: Christian Brauner <brauner@...nel.org>,
Miklos Szeredi <miklos@...redi.hu>,
linux-fsdevel@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
Karel Zak <kzak@...hat.com>,
Greg KH <gregkh@...uxfoundation.org>,
linux-kernel@...r.kernel.org,
Linux API <linux-api@...r.kernel.org>,
linux-man <linux-man@...r.kernel.org>,
LSM <linux-security-module@...r.kernel.org>,
Ian Kent <raven@...maw.net>,
David Howells <dhowells@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Al Viro <viro@...iv.linux.org.uk>,
Christian Brauner <christian@...uner.io>,
Amir Goldstein <amir73il@...il.com>,
James Bottomley <James.Bottomley@...senpartnership.com>
Subject: Re: [RFC PATCH] getting misc stats/attributes via xattr API
On Tue, May 10, 2022 at 02:45:39PM +0200, Florian Weimer wrote:
> * Dave Chinner:
>
> > IOWs, what Linux really needs is a listxattr2() syscall that works
> > the same way that getdents/XFS_IOC_ATTRLIST_BY_HANDLE work. With the
> > list function returning value sizes and being able to iterate
> > effectively, every problem that listxattr() causes goes away.
>
> getdents has issues of its own because it's unspecified what happens if
> the list of entries is modified during iteration. Few file systems add
> another tree just to guarantee stable iteration.
The filesystem I care about (XFS) guarantees stable iteration and
stable seekdir/telldir cookies. It's not that hard to do, but it
requires the filesystem designer to understand that this is a
necessary feature before they start designing the on-disk directory
format and lookup algorithms....
> Maybe that's different for xattrs because they are supposed to be small
> and can just be snapshotted with a full copy?
It's different for xattrs because we directly control the API
specification for XFS_IOC_ATTRLIST_BY_HANDLE, not POSIX. We can
define the behaviour however we want. Stable iteration is what
listing keys needs.
The cursor is defined as 16 bytes of opaque data, enabling us to
encoded exactly where in the hashed name btree index we have
traversed to:
/*
* Kernel-internal version of the attrlist cursor.
*/
struct xfs_attrlist_cursor_kern {
__u32 hashval; /* hash value of next entry to add */
__u32 blkno; /* block containing entry (suggestion) */
__u32 offset; /* offset in list of equal-hashvals */
__u16 pad1; /* padding to match user-level */
__u8 pad2; /* padding to match user-level */
__u8 initted; /* T/F: cursor has been initialized */
};
Hence we have all the information in the cursor we need to reset the
btree traversal index to the exact entry we finished at (even in the
presence of hash collisions in the index). Hence removal of the
entry the cursor points to isn't a problem for us, we just move to
the next highest sequential hash index in the btree and start again
from there.
Of course, if this is how we define listxattr2() behaviour (or maybe
we should call it "list_keys()" to make it clear we are treating
this as a key/value store instead of xattrs) then each filesystem
can put what it needs in that cursor to guarantee it can restart key
iteration correctly if the entry the cursor points to has been
removed. We can also make the cursor larger if necessary for other
filesystems to store the information they need.
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists