[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1582316494.3376.45.camel@HansenPartnership.com>
Date: Fri, 21 Feb 2020 12:21:34 -0800
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: David Howells <dhowells@...hat.com>, viro@...iv.linux.org.uk
Cc: raven@...maw.net, mszeredi@...hat.com, christian@...uner.io,
jannh@...gle.com, darrick.wong@...cle.com,
linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/17] VFS: Filesystem information and notifications
[ver #17]
On Fri, 2020-02-21 at 18:01 +0000, David Howells wrote:
[...]
> ============================
> FILESYSTEM INFORMATION QUERY
> ============================
>
> The fsinfo() system call allows information about the filesystem at a
> particular path point to be queried as a set of attributes, some of
> which may have more than one value.
>
> Attribute values are of four basic types:
>
> (1) Version dependent-length structure (size defined by type).
>
> (2) Variable-length string (up to 4096, including NUL).
>
> (3) List of structures (up to INT_MAX size).
>
> (4) Opaque blob (up to INT_MAX size).
>
> Attributes can have multiple values either as a sequence of values or
> a sequence-of-sequences of values and all the values of a particular
> attribute must be of the same type.
>
> Note that the values of an attribute *are* allowed to vary between
> dentries within a single superblock, depending on the specific dentry
> that you're looking at, but all the values of an attribute have to be
> of the same type.
>
> I've tried to make the interface as light as possible, so
> integer/enum attribute selector rather than string and the core does
> all the allocation and extensibility support work rather than leaving
> that to the filesystems. That means that for the first two attribute
> types, the filesystem will always see a sufficiently-sized buffer
> allocated. Further, this removes the possibility of the filesystem
> gaining access to the userspace buffer.
>
>
> fsinfo() allows a variety of information to be retrieved about a
> filesystem and the mount topology:
>
> (1) General superblock attributes:
>
> - Filesystem identifiers (UUID, volume label, device numbers,
> ...)
> - The limits on a filesystem's capabilities
> - Information on supported statx fields and attributes and IOC
> flags.
> - A variety single-bit flags indicating supported capabilities.
> - Timestamp resolution and range.
> - The amount of space/free space in a filesystem (as statfs()).
> - Superblock notification counter.
>
> (2) Filesystem-specific superblock attributes:
>
> - Superblock-level timestamps.
> - Cell name.
> - Server names and addresses.
> - Filesystem-specific information.
>
> (3) VFS information:
>
> - Mount topology information.
> - Mount attributes.
> - Mount notification counter.
>
> (4) Information about what the fsinfo() syscall itself supports,
> including
> the type and struct/element size of attributes.
>
> The system is extensible:
>
> (1) New attributes can be added. There is no requirement that a
> filesystem implement every attribute. Note that the core VFS
> keeps a
> table of types and sizes so it can handle future extensibility
> rather
> than delegating this to the filesystems.
>
> (2) Version length-dependent structure attributes can be made larger
> and
> have additional information tacked on the end, provided it keeps
> the
> layout of the existing fields. If an older process asks for a
> shorter
> structure, it will only be given the bits it asks for. If a
> newer
> process asks for a longer structure on an older kernel, the
> extra
> space will be set to 0. In all cases, the size of the data
> actually
> available is returned.
>
> In essence, the size of a structure is that structure's version:
> a
> smaller size is an earlier version and a later version includes
> everything that the earlier version did.
>
> (3) New single-bit capability flags can be added. This is a
> structure-typed
> attribute and, as such, (2) applies. Any bits you wanted but
> the kernel
> doesn't support are automatically set to 0.
>
> fsinfo() may be called like the following, for example:
>
> struct fsinfo_params params = {
> .at_flags = AT_SYMLINK_NOFOLLOW,
> .flags = FSINFO_FLAGS_QUERY_PATH,
> .request = FSINFO_ATTR_AFS_SERVER_ADDRESSES,
> .Nth = 2,
> };
> struct fsinfo_server_address address;
> len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", ¶ms,
> &address, sizeof(address));
>
> The above example would query an AFS filesystem to retrieve the
> address
> list for the 3rd server, and:
>
> struct fsinfo_params params = {
> .at_flags = AT_SYMLINK_NOFOLLOW,
> .flags = FSINFO_FLAGS_QUERY_PATH,
> .request = FSINFO_ATTR_AFS_CELL_NAME;
> };
> char cell_name[256];
> len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", ¶ms,
> &cell_name, sizeof(cell_name));
>
> would retrieve the name of an AFS cell as a string.
>
> In future, I want to make fsinfo() capable of querying a context
> created by
> fsopen() or fspick(), e.g.:
>
> fd = fsopen("ext4", 0);
> struct fsinfo_params params = {
> .flags = FSINFO_FLAGS_QUERY_FSCONTEXT,
> .request = FSINFO_ATTR_PARAMETERS;
> };
> char buffer[65536];
> fsinfo(fd, NULL, ¶ms, &buffer, sizeof(buffer));
>
> even if that context doesn't currently have a superblock attached. I
> would prefer this to contain length-prefixed strings so that there's
> no need to insert escaping, especially as any character, including
> '\', can be used as the separator in cifs and so that binary
> parameters can be returned (though that is a lesser issue).
Could I make a suggestion about how this should be done in a way that
doesn't actually require the fsinfo syscall at all: it could just be
done with fsconfig. The idea is based on something I've wanted to do
for configfd but couldn't because otherwise it wouldn't substitute for
fsconfig, but Christian made me think it was actually essential to the
ability of the seccomp and other verifier tools in the critique of
configfd and I belive the same critique applies here.
Instead of making fsconfig functionally configure ... as in you pass
the attribute name, type and parameters down into the fs specific
handler and the handler does a string match and then verifies the
parameters and then acts on them, make it table configured, so what
each fstype does is register a table of attributes which can be got and
optionally set (with each attribute having a get and optional set
function). We'd have multiple tables per fstype, so the generic VFS
can register a table of attributes it understands for every fstype
(things like name, uuid and the like) and then each fs type would
register a table of fs specific attributes following the same pattern.
The system would examine the fs specific table before the generic one,
allowing overrides. fsconfig would have the ability to both get and
set attributes, permitting retrieval as well as setting (which is how I
get rid of the fsinfo syscall), we'd have a global parameter, which
would retrieve the entire table by name and type so the whole thing is
introspectable because the upper layer knows a-priori all the
attributes which can be set for a given fs type and what type they are
(so we can make more of the parsing generic). Any attribute which
doesn't have a set routine would be read only and all attributes would
have to have a get routine meaning everything is queryable.
I think I know how to code this up in a way that would be fully
transparent to the existing syscalls.
James
Powered by blists - more mailing lists